Convolution and Differentiation of Distributions

Why you can freely pass derivatives through convolutions in distribution theory.

The purpose of this note is to provide a proof of the following theorem

Convolving with a Distribution

Let $F$ be a distribution, $g$ a smooth compactly supported function, and $\nabla$ some differential operator. $\nabla (F\star g)=(\nabla F)\star g$

I’ll review (extremely rapidly!) to fix notation. We equip the vector space $C_c^\infty$ of smooth compactly supported real valued functions on $\mathbb{R}^n$ with the following topology: a sequence $\phi_n$ converges to $\phi$ if the $\phi_n$ and all their derivatives converge to $\phi$ and all of its derivatives with respect to the norm $||\phi||=\int_{\mathbb{R}^n}|\phi|d\mathrm{vol}$ . The topological dual of $\mathcal{D}$ of $C_c^\infty$ is the set of distributions on $\mathbb{R}^n$ , or continuous linear functionals $C_c^\infty\to \mathbb{R}$ .

Every smooth function $f\in C_c^\infty$ naturally gives rise to a distribution $F\in\mathcal{D}$ via integration, $f\mapsto F\hspace{1cm} F(\phi)=\int_{\mathbb{R}^n}f\phi d\mathrm{vol},$ but not all distributions are of this form. Important examples are given by the delta distributions: for any $p\in \mathbb{R}^n$ we define $\delta_p\in \mathcal{D}$ by $\delta_p(\phi):=\phi(p)$ , and we write $\delta$ for $\delta_0$ .

The set of distributions is closed under differentiation, where the derivative of a distribution $F$ is defined by its action on a function $\phi$ in analogy to integration by parts. When $M=\mathbb{R}$ this lets us define the first derivative $F^\prime$ of $F$ by $F^\prime(\phi):=-F(\phi^\prime)$ . More generally, if $\nabla$ is some differential operator on $C_c^\infty$ we define $\nabla$ on $\mathcal{D}$ by $\nabla F(\phi):=-F\left(\nabla \phi\right)$

Distributions can be multiplied by smooth functions: if $\psi\colon\mathbb{R}^n\to\mathbb{R}$ is smooth and $F\in \mathcal{D}$ , we define $\psi F$ to be the distribution such that $(\psi F)(\phi)=F(\psi\phi)$ for all $\phi\in C_c^\infty$ . Distributions can also be convolved with functions in $C_c^\infty$ , but this operation now yields smooth functions rather than distributions: if $F\in\mathcal{D}$ and $g\in C_c^\infty$ , their convolutional product $F\star g$ is defined below, where $g(p-\cdot)$ is the function $x\mapsto g(p-x)$ . $F\star g\colon\hspace{3mm} p\mapsto F\left(g(p-\cdot)\right)$

Having defined everything in the theorem, our goal is clear: we wish to show that the differentiating the function resulting from $F\star g$ is the same as the function that arises from convolving the distribution $\nabla F$ with $g$ .

Why We Care

One use of this is to justify a rather clever approach to finding general solutions to PDEs. The goal is to calculate (hopefully easier to find) distributional solutions to a differential equation to actual, real-valued function solutions through convolution. More precisely, if $\nabla$ is a differential operator we say a distribution $F$ is a fundamental solution for $\nabla$ if $\nabla F=\delta$ .

The delta distribution is important here because of its particular relationship to convolution. For any $\phi\in C_c^\infty$ , we compute the value of $\delta\star\phi$ at $p\in\mathbb{R}^n$ a $(\delta\star\phi)(p):=\delta\left(\phi(p-\cdot)\right)=\phi(p-0)=\phi(p)$ Thus, $\delta\star\phi=\phi$ , and colvolution with $\delta$ realizes the identity operator on $C_c^\infty$ .

Knowing this, a fundamental solution lets us find a real solution to the differential equation $\nabla u=g$ , as follows. Since $g=\delta\star g$ and $\delta = \nabla F$ we see that $g=(\nabla F)\star g$ . But, using the claimed theorem this can be rewritten as $g=\nabla(F\star g)$ . But this is precisely the statement that the function $u=F\star g$ solves the equation $\nabla u = g$ , as desired! Thus, the general soltuion to our PDE is simply the convolution of the fundamental soltuion with the initial condition.

Of course, this (crucially!) relies on the main theorem of this note and recently I realized I had completely forgotten how to prove this fact! Luckily, shortly after Daniel O’Connor showed me how it works, and so I want to write it down for the next time that I forget.

Proving the Theorem

To prove this, we start small and build up to the general case. The interesting part is actually this small beginning however, the rest is just packaging.

On $\mathbb{R}$ , suppose $F\in \mathcal{D}$ and $g\in C_c^\infty$ . Then $F\star g$ is differentiable and $(F\star g)^\prime=(F^\prime)\star g$ .

Here’s Daniel’s argument: Let $x\in \mathbb{R}$ , $h\neq 0$ and consider the difference quotient $\psi_h(x):=\frac{(F\star g)(x+h)-(F\star g)(x)}{h}$ If the limit $\lim_{h\to 0}\psi_h(x)$ exists, then $F\star g$ is differentiable at $x$ . Using the definition of $F\star g$ and the linearity of $F$ , we may evaluate this as

\psi_h(x)=\frac{F\left(g(x+h-\cdot)\right)-F\left(g(x-\cdot)\right)}{h}\\

=F\left(\frac{g(x+h-\cdot)-g(x-\cdot)}{h}\right)

Using the continuity of $F$ , we may take the limit inside, and so $\lim_{h\to 0}\psi_h(x)=F\left(\lim_{h\to 0}\frac{g(x+h-\cdot)-g(x-\cdot)}{h}\right).$

The quantity inside of $F$ attempts to assign to each $p\in \mathbb{R}$ the value $p\mapsto \lim_{h\to 0}\frac{g(x+h-p)-g(x-p)}{h}=g^\prime(x-p)$ so in our notation, this is the function $g^\prime(x-\cdot)$ , which itself is in $C_c^\infty$ as $g$ was. Thus, $(F\star g)^\prime(x)$ exists, and $(F\star g)^\prime(x)=F(g^\prime(x-\cdot))$ . But this new term is exactly the definition of $F$ convolved with $g^\prime$ , when evaluated at $x$ ! Thus as functions, we have shown $(F\star g)^\prime = F\star g^\prime$

This is half of what we want, but the rest is just a straightforward application of the definition of the distributional derivative. By definition, $F^\prime$ is the linear functional such that $F^\prime(\phi)=-F(\phi^\prime)$ for all $\phi\in C_c^\infty$ , so computing $(F^\prime)\star g$ , we see for $x\in\mathbb{R}$ $(F^\prime\star g)(x)=F^\prime\left(g(x-\cdot)\right):=-F(g(x-\cdot)^\prime)$

Where $g(x-\cdot)^\prime$ is the function sending $p\mapsto \tfrac{d}{dp}g(x-p)$ . Computing this derivative with the chain rule shows $g(x-\cdot)^\prime=-g^\prime(x-\cdot)$ , and so $-F(g(x-\cdot)^\prime)=-F(-g^\prime(x-\cdot))=F(g^\prime(x-\cdot))$

Stringing all this together, we see $(F^\prime\star g)(x)=F(g^\prime(x-\cdot))$ , where we recognize this second term as defining the convolution $F\star g^\prime(x)$ . As this equality holds for all $x\in\mathbb{R}$ we have equality between functions: $F^\prime\star g=F\star g^\prime$

Combining with our earlier result proves the theorem, as we have shown both $(F\star g)^\prime$ and $F^\prime\star g$ are equal to $F\star g^\prime$ .

Let $D^k$ be the $k^{th}$ derivative operator on $C_c^\infty(\mathbb{R})$ . Then for any $F\in\mathcal{D}$ and $g\in C_c^\infty$ , the convolution $F\star g$ is $k$ times differentiable and $D^k(F\star g)=(D^k F)\star g=F\star (D^k g)$

:::proof We can proceed inductively using Theorem 1, as $D^k=D\circ D^{k-1}$ is the $k$ -fold composition of the first derivative operator. :::

On $\mathbb{R}^n$ , let $\partial_x$ denote the directional derivative with respect to the first coordinate. Then for any $F\in\mathcal{D}$ and $g\in C_c^\infty$ , $\partial_x(F\star g)$ is smooth, and $\partial_x(F\star g)=(\partial_x F)\star g=F\star(\partial_x g)$ .

The proof is exactly analogous to the one dimensional case in Theorem 1, so we can proceed rather quickly. It suffices to check this equality holds at an arbitrary fixed $p\in\mathbb{R}^n$ , where $\partial_x(F\star g)(p)=\lim_{h\to 0}\frac{(F\star g)(p+he_1)-(F\star g)(p)}{h}$ Evaluatiting the convolutions and using the linearity and continuity of $F$ shows this to be $F\left(\partial_x g(p-\cdot)\right)$ , which is the convolution of $F$ with $\partial_xg$ evaluated at $p$ . Thus, $\partial_x(F\star g)=F\star(\partial_x g).$ The second equality again follows simply by starting the definition of $\partial_xF$ to compute $(\partial_xF)\star g$ at $p$ , resulting in $(\partial_xF)\star g=F\star(\partial_x g).$

If $L=\partial_x^a\partial_y^b\partial_z^c\cdots$ is any monomial in the coordinate partial derivative operators on $\mathbb{R}^n$ , then for any $F\in\mathcal{D}$ and $g\in C_c^\infty$ , $L(F\star g)=(LF)\star g=F\star(Lg)$

As in corollary 2, we inductively apply Lemma 3 to each partial derivative operator which shows up in $L$ .

To build upwards from this, its useful to stop for a second and factorize out a little argument about convolution:

If $F,\Phi$ are distributions and $g,\gamma$ a smooth compactly supported functions, then $(F+\Phi)\star g=F\star g+\Phi\star g$ and $F\star(g+\gamma)=F\star g+F\star\gamma$ .

Let $x\in\mathbb{R}^n$ . First consider $F\star(g+\gamma)$ evaluated at $x$ . This is by definition $F((g+\gamma)(x-\cdot))$ , that is, $F(g(x-\cdot)+\gamma(x-\cdot))$ . Using the linearity of $F$ , we see this to be $F(g(x-\cdot))+F(\gamma(x-\cdot))$ , which is by definition $(F\star g)(x)+(F\star\gamma)(x)$ . Thus, $F\star(g+\gamma)=F\star g+F\star \gamma.$

Next, consider $(F+\Phi)\star g$ evaluated at $x$ . By the definition of convolution, $\left((F+\Phi)\star g\right)(x)=(F+\Phi)(g(x-\cdot))$ . Using the definition of $+$ in $\mathcal{D}$ , we distribute as $(F+\Phi)(g(x-\cdot))=F(g(x-\cdot))+\Phi(g(x-\cdot))$ , where the last terms are each by definition equal to $(F\star g)(x)$ and $(\Phi\star g)(x)$ respectively. Thus, $(F+\Phi)\star g=F\star g+\Phi\star g.$

Let $L_1,L_2$ be a differential operator on $\mathbb{R}^n$ such that $L_i(F\star g)=(L_iF)\star g=F\star(L_ig)$ for $i\in\{1,2\}$ and any $F\in\mathcal{D}$ , $g\in C_c^\infty$ . Then $L=L_1+L_2$ also satisfies $L(F\star g)=(LF)\star g=F\star Lg$ for all $F,g$ .

The differential operator $L=L_1+L_2$ acts on functions by $L\phi=L_1\phi+L_2\phi$ . Thus if $F\in\mathcal{D}$ , $g\in C_c^\infty$ , $(L_1+L_2)(F\star g)=L_1(F\star g)+L_2(F\star g)$ .

To get the first of the two claimed equalities, we can use half our hypothesis on the $L_i$ to re-write this as $(L_1F)\star g+(L_2 F)\star g$ , and then use Lemma 5 to factor out the convolution giving $(L_1+L_2)(F\star g)=(L_1F+L_2F)\star g$ . Factoring out the $F$ gives what we wanted: $(L_1+L_2)(F\star g)=((L_1+L_2)F)\star g$

To get the second equality, we use the other half of our assumption on the $L_i$ to rewrite $L_1(F\star g)+L_2(F\star g)$ as $F\star(L_1g)+F\star(L_2g)$ . We use Lemma 5 to factor out the distribution from this convolution, followed by the further factoring $L_1g+L_2g=(L_1+L_2)g$ . All together, this gives what we wanted: $(L_1+L_2)(F\star g)=F\star((L_1+L_2)g)$

Let $L$ be a differential operator on $\mathbb{R}^n$ such that $L(F\star g)=(LF)\star g=F\star(Lg)$ for any $F\in\mathcal{D}$ and $g\in C_c^\infty$ . Then if $\psi\colon\mathbb{R}^n\to R$ is any smooth function, the differential operator $K=\psi L$ defined by $K\phi=\psi L(\phi)$ also satisfies $K(F\star g)=(KF)\star g=F\star (Kg)$ for all $F,g$ .

Evaluating $K(F\star g)=\psi\cdot L(F\star g)$ ,we can use that $L$ satisfies our hypothesis to conclude this is equal to $\psi\cdot ((LF)\star g)$ and $\psi\cdot(F\star(Lg))$ . Taking the former and evaluating at $x\in\mathbb{R}^n$ , we see it to equal $\psi(x)((LF)\star g)(x)=\psi(x)(LF)(g(x-\cdot))$ At this fixed $x$ , $\psi(x)$ is a constant, and so this is the same as evaluating the distribution $(\psi(x)LF)$ on $g(x-\cdot)$ . But this is the definition of the convolution of $\psi(x)LF$ with the function $g$ , evaluated at $x$ ! Thus, all together $\psi\cdot ((LF)\star g)$ is the function $x\mapsto \left((\psi(x)LF)\star g\right)(x)$ which is the first half of what we want: $K(F\star g)=\psi\cdot ((LF)\star g)=(\psi LF)\star g=(KF)\star g$

The other case is similar, considering $\psi\cdot(F\star(Lg))$ evaluated at $x$ . This yields $\psi(x)F((Lg)(x-\cdot))$ , and as $\psi(x)$ is a constant at this $x$ , we may pull it inside to get $F(\psi(x)(Lg)(x-\cdot))$ . That is, $\psi\cdot(F\star(Lg))$ sends $x$ to the result of convolving the function $\psi(x)Lg$ with $F$ , so $K(F\star g)=\psi\cdot(F\star(Lg))=F\star(\psi L g)=F\star(Kg)$

Finally, we need a description of the class of linear differential operators on $\mathbb{R}^n$ which is amenable to our start-small-and-build-upwards approach:

USEFUL FACT Any linear differential operator on $\mathbb{R}^n$ is a multinomial in the partial derivative operators, with coefficients in $C^\infty(\mathbb{R}^n)$ . All the hard work is done, now its just putting the pieces together to state the main result:

Let $\nabla$ be any linear differential operator on $\mathbb{R}^n$ . Then $\nabla(F\star g)=(\nabla F)\star g=F\star(\nabla g)$ for all $F\in \mathcal{D}$ , $g\in C_c^\infty$ .

:::proof We write $\nabla$ as a multinomial in the partial derivatives, $\nabla = \sum_{[\alpha]} \psi_{[\alpha]} \partial^{[\alpha]}$ where $[\alpha]=[a,b,c,\cdots]$ ranges over some finite subset of all multi-indices, $\partial^{[\alpha]}=\partial_x^a\partial_y^b\partial_z^c\cdots$ , and for each index $\psi_{[\alpha]}$ is some smooth function $\mathbb{R}^n\to \mathbb{R}$ . But as each $\partial^{[\alpha]}$ satisfies the desired property by Corollary 4, we can apply lemmas 6 and 7 finitely many times to conclude that $\nabla$ does as well. :::