# Convolution and Differentiation of Distributions

## If F is a distribution and g is a function, why is (DF)*g equal to F*Dg?

Steve Trettel

| Analysis

I’ll review (extremely rapidly!) to fix notation. We equip the vector space $C_c^\infty$ of smooth compactly supported real valued functions on $\mathbb{R}^n$ with the following topology: a sequence $\phi_n$ converges to $\phi$ if the $\phi_n$ and all their derivatives converge to $\phi$ and all of its derivatives with respect to the norm $||\phi||=\int_{\mathbb{R}^n}|\phi|d\mathrm{vol}$. The topological dual of $\mathcal{D}$ of $C_c^\infty$ is the set of *distributions* on $\mathbb{R}^n$, or continuous linear functionals $C_c^\infty\to \mathbb{R}$.

Every smooth function $f\in C_c^\infty$ naturally gives rise to a distribution $F\in\mathcal{D}$ via integration, $$f\mapsto F\hspace{1cm} F(\phi)=\int_{\mathbb{R}^n}f\phi d\mathrm{vol},$$ but not all distributions are of this form. Important examples are given by the *delta distributions*: for any $p\in \mathbb{R}^n$ we define $\delta_p\in \mathcal{D}$ by $\delta_p(\phi):=\phi(p)$, and we write $\delta$ for $\delta_0$.

The set of distributions is closed under differentiation, where the derivative of a distribution $F$ is defined by its action on a function $\phi$ in analogy to integration by parts. When $M=\mathbb{R}$ this lets us define the first derivative $F^\prime$ of $F$ by $F^\prime(\phi):=-F(\phi^\prime)$. More generally, if $\nabla$ is some differential operator on $C_c^\infty$ we define $\nabla$ on $\mathcal{D}$ by $$\nabla F(\phi):=-F\left(\nabla \phi\right)$$

Distributions can be multiplied by smooth functions: if $\psi\colon\mathbb{R}^n\to\mathbb{R}$ is smooth and $F\in \mathcal{D}$, we define $\psi F$ to be the distribution such that $(\psi F)(\phi)=F(\psi\phi)$ for all $\phi\in C_c^\infty$. Distributions can also be convolved with functions in $C_c^\infty$, but this operation now yields smooth functions rather than distributions: if $F\in\mathcal{D}$ and $g\in C_c^\infty$, their convolutional product $F\star g$ is defined below, where $g(p-\cdot)$ is the function $x\mapsto g(p-x)$. $$F\star g\colon\hspace{3mm} p\mapsto F\left(g(p-\cdot)\right)$$

As an example, if $\phi\in C_c^\infty$, we compute the value of $\delta\star\phi$ at $p\in\mathbb{R}^n$ aa $$(\delta\star\phi)(p):=\delta\left(\phi(p-\cdot)\right)=\phi(p-0)=\phi(p)$$ Thus, $\delta\star\phi=\phi$, and colvolution with $\delta$ realizes the identity operator on $C_c^\infty$.

Differentiation and convolution interact in a particularly simple way, if $\nabla$ is a linear differential operator and $F\in\mathcal{D}$, $g\in C_c^\infty$, then we may differentiate the function $F\star g$ by $$\nabla(F\star g)=(\nabla F)\star g = F\star (\nabla g)$$

Once you know this line of equalities, the motivation for introducing distributions to solve PDE’s becomes clear! The general goal is to calculate (hopefully easier to find) distributional solutions to a differential equation to actual, real-valued function solutions through convolution.If $\nabla$ is a differential operator we say a distribution $F$ is a fundamental solution for $\nabla$ if $\nabla F=\delta$ Such a fundamental solution lets us find a real solution to the differential equation $\nabla u=g$, with $g\in C_c^\infty$ by simply taking $u=F\star g$, as we easily confirm:

$$\nabla(F\star g)=(\nabla F)\star g=\delta\star g=g$$

Of course, this (crucially!) relies on the fact that $\nabla (F\star g)=(\nabla F)\star g$, and recently I realized I had completely forgotten how to prove this fact! Luckily, shortly after Daniel O’Connor showed me how it works, and so I want to write it down for the next time that I forget.

## Proving $\nabla(F\star g)=(\nabla F)\star g=F\star(\nabla g)$

To prove this, we start small and build up to the general case. The interesting part is actually this small beginning however, the rest is just packaging.

Theorem 1: On $\mathbb{R}$, suppose $F\in \mathcal{D}$ and $g\in C_c^\infty$. Then $F\star g$ is differentiable and $(F\star g)^\prime=(F^\prime)\star g$.

Proof: Here’s Daniel’s argument: Let $x\in \mathbb{R}$, $h\neq 0$ and consider the difference quotient $$\psi_h(x):=\frac{(F\star g)(x+h)-(F\star g)(x)}{h}$$ If the limit $\lim_{h\to 0}\psi_h(x)$ exists, then $F\star g$ is differentiable at $x$. Using the definition of $F\star g$ and the linearity of $F$, we may evaluate this as $$\psi_h(x)=\frac{F\left(g(x+h-\cdot)\right)-F\left(g(x-\cdot)\right)}{h}\$$ $$=F\left(\frac{g(x+h-\cdot)-g(x-\cdot)}{h}\right)$$

Using the continuity of $F$, we may take the limit inside, and so $$\lim_{h\to 0}\psi_h(x)=F\left(\lim_{h\to 0}\frac{g(x+h-\cdot)-g(x-\cdot)}{h}\right).$$

The quantity inside of $F$ attempts to assign to each $p\in \mathbb{R}$ the value $$p\mapsto \lim_{h\to 0}\frac{g(x+h-p)-g(x-p)}{h}=g^\prime(x-p)$$ so in our notation, this is the function $g^\prime(x-\cdot)$, which itself is in $C_c^\infty$ as $g$ was. Thus, $(F\star g)^\prime(x)$ exists, and $(F\star g)^\prime(x)=F(g^\prime(x-\cdot))$. But this new term is exactly the definition of $F$ convolved with $g^\prime$, when evaluated at $x$! Thus as functions, we have shown $$(F\star g)^\prime = F\star g^\prime$$

This is half of what we want, but the rest is just a straightforward application of the definition of the distributional derivative. By definition, $F^\prime$ is the linear functional such that $F^\prime(\phi)=-F(\phi^\prime)$ for all $\phi\in C_c^\infty$, so computing $(F^\prime)\star g$, we see for $x\in\mathbb{R}$ $$(F^\prime\star g)(x)=F^\prime\left(g(x-\cdot)\right):=-F(g(x-\cdot)^\prime)$$

Where $g(x-\cdot)^\prime$ is the function sending $p\mapsto \tfrac{d}{dp}g(x-p)$. Computing this derivative with the chain rule shows $g(x-\cdot)^\prime=-g^\prime(x-\cdot)$, and so $$-F(g(x-\cdot)^\prime)=-F(-g^\prime(x-\cdot))=F(g^\prime(x-\cdot))$$

Stringing all this together, we see $(F^\prime\star g)(x)=F(g^\prime(x-\cdot))$, where we recognize this second term as defining the convolution $F\star g^\prime(x)$. As this equality holds for all $x\in\mathbb{R}$ we have equality between functions: $$F^\prime\star g=F\star g^\prime$$

Combining with our earlier result proves the theorem, as we have shown both $(F\star g)^\prime$ and $F^\prime\star g$ are equal to $F\star g^\prime$.

Corollary 2: Let $D^k$ be the $k^{th}$ derivative operator on $C_c^\infty(\mathbb{R})$. Then for any $F\in\mathcal{D}$ and $g\in C_c^\infty$, the convolution $F\star g$ is $k$ times differentiable and $D^k(F\star g)=(D^k F)\star g=F\star (D^k g)$

Proof: We can proceed inductively using Theorem 1, as $D^k=D\circ D^{k-1}$ is the $k$-fold composition of the first derivative operator.

Lemma 3: On $\mathbb{R}^n$, let $\partial_x$ denote the directional derivative with respect to the first coordinate. Then for any $F\in\mathcal{D}$ and $g\in C_c^\infty$, $\partial_x(F\star g)$ is smooth, and $\partial_x(F\star g)=(\partial_x F)\star g=F\star(\partial_x g)$.

Proof: The proof is exactly analogous to the one dimensional case in Theorem 1, so we can proceed rather quickly. It suffices to check this equality holds at an arbitrary fixed $p\in\mathbb{R}^n$, where $$\partial_x(F\star g)(p)=\lim_{h\to 0}\frac{(F\star g)(p+he_1)-(F\star g)(p)}{h}$$ Evaluatiting the convolutions and using the linearity and continuity of $F$ shows this to be $F\left(\partial_x g(p-\cdot)\right)$, which is the convolution of $F$ with $\partial_xg$ evaluated at $p$. Thus, $\partial_x(F\star g)=F\star(\partial_x g).$ The second equality again follows simply by starting the definition of $\partial_xF$ to compute $(\partial_xF)\star g$ at $p$, resulting in $(\partial_xF)\star g=F\star(\partial_x g).$

Corollary 4: If $L=\partial_x^a\partial_y^b\partial_z^c\cdots$ is any monomial in the coordinate partial derivative operators on $\mathbb{R}^n$, then for any $F\in\mathcal{D}$ and $g\in C_c^\infty$, $$L(F\star g)=(LF)\star g=F\star(Lg)$$

Proof: As in corollary 2, we inductively apply Lemma 3 to each partial derivative operator which shows up in $L$.

To build upwards from this, its useful to stop for a second and factorize out a little argument about convolution:

Lemma 5: If $F,\Phi$ are distributions and $g,\gamma$ a smooth compactly supported functions, then $(F+\Phi)\star g=F\star g+\Phi\star g$ and $F\star(g+\gamma)=F\star g+F\star\gamma$.

Proof: Let $x\in\mathbb{R}^n$. First consider $F\star(g+\gamma)$ evaluated at $x$. This is by definition $F((g+\gamma)(x-\cdot))$, that is, $F(g(x-\cdot)+\gamma(x-\cdot))$. Using the linearity of $F$, we see this to be $F(g(x-\cdot))+F(\gamma(x-\cdot))$, which is by definition $(F\star g)(x)+(F\star\gamma)(x)$. Thus, $$F\star(g+\gamma)=F\star g+F\star \gamma.$$

Next, consider$(F+\Phi)\star g$ evaluated at $x$. By the definition of convolution, $\left((F+\Phi)\star g\right)(x)=$$(F+\Phi)(g(x-\cdot)). Using the definition of + in \mathcal{D}, we distribute as (F+\Phi)(g(x-\cdot))=$$F(g(x-\cdot))+\Phi(g(x-\cdot))$, where the last terms are each by definition equal to $(F\star g)(x)$ and $(\Phi\star g)(x)$ respectively. Thus, $$(F+\Phi)\star g=F\star g+\Phi\star g.$$

Lemma 6: Let $L_1,L_2$ be a differential operator on $\mathbb{R}^n$ such that $L_i(F\star g)=(L_iF)\star g=F\star(L_ig)$ for $i\in{1,2}$ and any $F\in\mathcal{D}$, $g\in C_c^\infty$. Then $L=L_1+L_2$ also satisfies $L(F\star g)=(LF)\star g=F\star Lg$ for all $F,g$.

Proof: The differential operator $L=L_1+L_2$ acts on functions by $L\phi=L_1\phi+L_2\phi$. Thus if $F\in\mathcal{D}$, $g\in C_c^\infty$, $(L_1+L_2)(F\star g)=$$L_1(F\star g)+L_2(F\star g). To get the first of the two claimed equalities, we can use half our hypothesis on the L_i to re-write this as (L_1F)\star g+(L_2 F)\star g, and then use Lemma 5 to factor out the convolution giving (L_1+L_2)(F\star g)=$$(L_1F+L_2F)\star g$. Factoring out the $F$ gives what we wanted: $$(L_1+L_2)(F\star g)=((L_1+L_2)F)\star g$$

To get the second equality, we use the other half of our assumption on the $L_i$ to rewrite $L_1(F\star g)+L_2(F\star g)$ as $F\star(L_1g)+F\star(L_2g)$. We use Lemma 5 to factor out the distribution from this convolution, followed by the further factoring $L_1g+L_2g=(L_1+L_2)g$. All together, this gives what we wanted: $$(L_1+L_2)(F\star g)=F\star((L_1+L_2)g)$$

Lemma 7: Let $L$ be a differential operator on $\mathbb{R}^n$ such that $L(F\star g)=(LF)\star g=F\star(Lg)$ for any $F\in\mathcal{D}$ and $g\in C_c^\infty$. Then if $\psi\colon\mathbb{R}^n\to R$ is any smooth function, the differential operator $K=\psi L$ defined by $K\phi=\psi L(\phi)$ also satisfies $K(F\star g)=(KF)\star g=F\star (Kg)$ for all $F,g$.

Proof: Evaluating $K(F\star g)=\psi\cdot L(F\star g)$,we can use that $L$ satisfies our hypothesis to conclude this is equal to $\psi\cdot ((LF)\star g)$ and $\psi\cdot(F\star(Lg))$. Taking the former and evaluating at $x\in\mathbb{R}^n$, we see it to equal $$\psi(x)((LF)\star g)(x)=\psi(x)(LF)(g(x-\cdot))$$ At this fixed $x$, $\psi(x)$ is a constant, and so this is the same as evaluating the distribution $(\psi(x)LF)$ on $g(x-\cdot)$. But this is the definition of the convolution of $\psi(x)LF$ with the function $g$, evaluated at $x$! Thus, all together $\psi\cdot ((LF)\star g)$ is the function $x\mapsto \left((\psi(x)LF)\star g\right)(x)$ which is the first half of what we want: $$K(F\star g)=\psi\cdot ((LF)\star g)=(\psi LF)\star g=(KF)\star g$$

The other case is similar, considering $\psi\cdot(F\star(Lg))$ evaluated at $x$. This yields $\psi(x)F((Lg)(x-\cdot))$, and as $\psi(x)$ is a constant at this $x$, we may pull it inside to get $F(\psi(x)(Lg)(x-\cdot))$. That is, $\psi\cdot(F\star(Lg))$ sends $x$ to the result of convolving the function $\psi(x)Lg$ with $F$, so $$K(F\star g)=\psi\cdot(F\star(Lg))=F\star(\psi L g)=F\star(Kg)$$

Finally, we need a description of the class of linear differential operators on $\mathbb{R}^n$ which is amenable to our start-small-and-build-upwards approach:

Lemma 8: Any linear differential operrator on $\mathbb{R}^n$ is a multinomial in the partial derivative operators, with coefficients in $C^\infty(\mathbb{R}^n)$.

All the hard work is done, now its just putting the pieces together to state the main result:

Theorem 9: Let $\nabla$ be any linear differential operator on $\mathbb{R}^n$. Then $\nabla(F\star g)=(\nabla F)\star g=F\star(\nabla g)$ for all $F\in \mathcal{D}$, $g\in C_c^\infty$.

Proof: We write $\nabla$ as a multinomial in the partial derivatives, $$\nabla = \sum_{[\alpha]} \psi_{[\alpha]} \partial^{[\alpha]}$$ where $[\alpha]=[a,b,c,\cdots]$ ranges over some finite subset of all multi-indices, $\partial^{[\alpha]}=\partial_x^a\partial_y^b\partial_z^c\cdots$, and for each index $\psi_{[\alpha]}$ is some smooth function $\mathbb{R}^n\to \mathbb{R}$. But as each $\partial^{[\alpha]}$ satisfies the desired property by Corollary 4, we can apply Lemmas 6 and 7 finitely many times to conclude that $\nabla$ does as well.