Convolution and Differentiation of Distributions

Why you can freely pass derivatives through convolutions in distribution theory.

The purpose of this note is to provide a proof of the following theorem

Convolving with a Distribution

Let FF be a distribution, gg a smooth compactly supported function, and \nabla some differential operator. (Fg)=(F)g\nabla (F\star g)=(\nabla F)\star g

I’ll review (extremely rapidly!) to fix notation. We equip the vector space CcC_c^\infty of smooth compactly supported real valued functions on Rn\mathbb{R}^n with the following topology: a sequence ϕn\phi_n converges to ϕ\phi if the ϕn\phi_n and all their derivatives converge to ϕ\phi and all of its derivatives with respect to the norm ϕ=Rnϕdvol||\phi||=\int_{\mathbb{R}^n}|\phi|d\mathrm{vol}. The topological dual of D\mathcal{D} of CcC_c^\infty is the set of distributions on Rn\mathbb{R}^n, or continuous linear functionals CcRC_c^\infty\to \mathbb{R}.

Every smooth function fCcf\in C_c^\infty naturally gives rise to a distribution FDF\in\mathcal{D} via integration, fFF(ϕ)=Rnfϕdvol,f\mapsto F\hspace{1cm} F(\phi)=\int_{\mathbb{R}^n}f\phi d\mathrm{vol}, but not all distributions are of this form. Important examples are given by the delta distributions: for any pRnp\in \mathbb{R}^n we define δpD\delta_p\in \mathcal{D} by δp(ϕ):=ϕ(p)\delta_p(\phi):=\phi(p), and we write δ\delta for δ0\delta_0.

The set of distributions is closed under differentiation, where the derivative of a distribution FF is defined by its action on a function ϕ\phi in analogy to integration by parts. When M=RM=\mathbb{R} this lets us define the first derivative FF^\prime of FF by F(ϕ):=F(ϕ)F^\prime(\phi):=-F(\phi^\prime). More generally, if \nabla is some differential operator on CcC_c^\infty we define \nabla on D\mathcal{D} by F(ϕ):=F(ϕ)\nabla F(\phi):=-F\left(\nabla \phi\right)

Distributions can be multiplied by smooth functions: if ψ ⁣:RnR\psi\colon\mathbb{R}^n\to\mathbb{R} is smooth and FDF\in \mathcal{D}, we define ψF\psi F to be the distribution such that (ψF)(ϕ)=F(ψϕ)(\psi F)(\phi)=F(\psi\phi) for all ϕCc\phi\in C_c^\infty. Distributions can also be convolved with functions in CcC_c^\infty, but this operation now yields smooth functions rather than distributions: if FDF\in\mathcal{D} and gCcg\in C_c^\infty, their convolutional product FgF\star g is defined below, where g(p)g(p-\cdot) is the function xg(px)x\mapsto g(p-x). Fg ⁣:pF(g(p))F\star g\colon\hspace{3mm} p\mapsto F\left(g(p-\cdot)\right)

Having defined everything in the theorem, our goal is clear: we wish to show that the differentiating the function resulting from FgF\star g is the same as the function that arises from convolving the distribution F\nabla F with gg.

Why We Care

One use of this is to justify a rather clever approach to finding general solutions to PDEs. The goal is to calculate (hopefully easier to find) distributional solutions to a differential equation to actual, real-valued function solutions through convolution. More precisely, if \nabla is a differential operator we say a distribution FF is a fundamental solution for \nabla if F=δ\nabla F=\delta.

The delta distribution is important here because of its particular relationship to convolution. For any ϕCc\phi\in C_c^\infty, we compute the value of δϕ\delta\star\phi at pRnp\in\mathbb{R}^n a (δϕ)(p):=δ(ϕ(p))=ϕ(p0)=ϕ(p)(\delta\star\phi)(p):=\delta\left(\phi(p-\cdot)\right)=\phi(p-0)=\phi(p) Thus, δϕ=ϕ\delta\star\phi=\phi, and colvolution with δ\delta realizes the identity operator on CcC_c^\infty.

Knowing this, a fundamental solution lets us find a real solution to the differential equation u=g\nabla u=g, as follows. Since g=δgg=\delta\star g and δ=F\delta = \nabla F we see that g=(F)gg=(\nabla F)\star g. But, using the claimed theorem this can be rewritten as g=(Fg)g=\nabla(F\star g). But this is precisely the statement that the function u=Fgu=F\star g solves the equation u=g\nabla u = g, as desired! Thus, the general soltuion to our PDE is simply the convolution of the fundamental soltuion with the initial condition.

Of course, this (crucially!) relies on the main theorem of this note and recently I realized I had completely forgotten how to prove this fact! Luckily, shortly after Daniel O’Connor showed me how it works, and so I want to write it down for the next time that I forget.

Proving the Theorem

To prove this, we start small and build up to the general case. The interesting part is actually this small beginning however, the rest is just packaging.

On R\mathbb{R}, suppose FDF\in \mathcal{D} and gCcg\in C_c^\infty. Then FgF\star g is differentiable and (Fg)=(F)g(F\star g)^\prime=(F^\prime)\star g.

Here’s Daniel’s argument: Let xRx\in \mathbb{R}, h0h\neq 0 and consider the difference quotient ψh(x):=(Fg)(x+h)(Fg)(x)h\psi_h(x):=\frac{(F\star g)(x+h)-(F\star g)(x)}{h} If the limit limh0ψh(x)\lim_{h\to 0}\psi_h(x) exists, then FgF\star g is differentiable at xx. Using the definition of FgF\star g and the linearity of FF, we may evaluate this as

ψh(x)=F(g(x+h))F(g(x))h\psi_h(x)=\frac{F\left(g(x+h-\cdot)\right)-F\left(g(x-\cdot)\right)}{h}\\=F(g(x+h)g(x)h)=F\left(\frac{g(x+h-\cdot)-g(x-\cdot)}{h}\right)

Using the continuity of FF, we may take the limit inside, and so limh0ψh(x)=F(limh0g(x+h)g(x)h).\lim_{h\to 0}\psi_h(x)=F\left(\lim_{h\to 0}\frac{g(x+h-\cdot)-g(x-\cdot)}{h}\right).

The quantity inside of FF attempts to assign to each pRp\in \mathbb{R} the value plimh0g(x+hp)g(xp)h=g(xp)p\mapsto \lim_{h\to 0}\frac{g(x+h-p)-g(x-p)}{h}=g^\prime(x-p) so in our notation, this is the function g(x)g^\prime(x-\cdot), which itself is in CcC_c^\infty as gg was. Thus, (Fg)(x)(F\star g)^\prime(x) exists, and (Fg)(x)=F(g(x))(F\star g)^\prime(x)=F(g^\prime(x-\cdot)). But this new term is exactly the definition of FF convolved with gg^\prime, when evaluated at xx! Thus as functions, we have shown (Fg)=Fg(F\star g)^\prime = F\star g^\prime

This is half of what we want, but the rest is just a straightforward application of the definition of the distributional derivative. By definition, FF^\prime is the linear functional such that F(ϕ)=F(ϕ)F^\prime(\phi)=-F(\phi^\prime) for all ϕCc\phi\in C_c^\infty, so computing (F)g(F^\prime)\star g, we see for xRx\in\mathbb{R} (Fg)(x)=F(g(x)):=F(g(x))(F^\prime\star g)(x)=F^\prime\left(g(x-\cdot)\right):=-F(g(x-\cdot)^\prime)

Where g(x)g(x-\cdot)^\prime is the function sending pddpg(xp)p\mapsto \tfrac{d}{dp}g(x-p). Computing this derivative with the chain rule shows g(x)=g(x)g(x-\cdot)^\prime=-g^\prime(x-\cdot), and so F(g(x))=F(g(x))=F(g(x))-F(g(x-\cdot)^\prime)=-F(-g^\prime(x-\cdot))=F(g^\prime(x-\cdot))

Stringing all this together, we see (Fg)(x)=F(g(x))(F^\prime\star g)(x)=F(g^\prime(x-\cdot)), where we recognize this second term as defining the convolution Fg(x)F\star g^\prime(x). As this equality holds for all xRx\in\mathbb{R} we have equality between functions: Fg=FgF^\prime\star g=F\star g^\prime

Combining with our earlier result proves the theorem, as we have shown both (Fg)(F\star g)^\prime and FgF^\prime\star g are equal to FgF\star g^\prime.

Let DkD^k be the kthk^{th} derivative operator on Cc(R)C_c^\infty(\mathbb{R}). Then for any FDF\in\mathcal{D} and gCcg\in C_c^\infty, the convolution FgF\star g is kk times differentiable and Dk(Fg)=(DkF)g=F(Dkg)D^k(F\star g)=(D^k F)\star g=F\star (D^k g)

:::proof We can proceed inductively using Theorem 1, as Dk=DDk1D^k=D\circ D^{k-1} is the kk-fold composition of the first derivative operator. :::

On Rn\mathbb{R}^n, let x\partial_x denote the directional derivative with respect to the first coordinate. Then for any FDF\in\mathcal{D} and gCcg\in C_c^\infty, x(Fg)\partial_x(F\star g) is smooth, and x(Fg)=(xF)g=F(xg)\partial_x(F\star g)=(\partial_x F)\star g=F\star(\partial_x g).

The proof is exactly analogous to the one dimensional case in Theorem 1, so we can proceed rather quickly. It suffices to check this equality holds at an arbitrary fixed pRnp\in\mathbb{R}^n, where x(Fg)(p)=limh0(Fg)(p+he1)(Fg)(p)h\partial_x(F\star g)(p)=\lim_{h\to 0}\frac{(F\star g)(p+he_1)-(F\star g)(p)}{h} Evaluatiting the convolutions and using the linearity and continuity of FF shows this to be F(xg(p))F\left(\partial_x g(p-\cdot)\right), which is the convolution of FF with xg\partial_xg evaluated at pp. Thus, x(Fg)=F(xg).\partial_x(F\star g)=F\star(\partial_x g). The second equality again follows simply by starting the definition of xF\partial_xF to compute (xF)g(\partial_xF)\star g at pp, resulting in (xF)g=F(xg).(\partial_xF)\star g=F\star(\partial_x g).

If L=xaybzcL=\partial_x^a\partial_y^b\partial_z^c\cdots is any monomial in the coordinate partial derivative operators on Rn\mathbb{R}^n, then for any FDF\in\mathcal{D} and gCcg\in C_c^\infty, L(Fg)=(LF)g=F(Lg)L(F\star g)=(LF)\star g=F\star(Lg)

As in corollary 2, we inductively apply Lemma 3 to each partial derivative operator which shows up in LL.

To build upwards from this, its useful to stop for a second and factorize out a little argument about convolution:

If F,ΦF,\Phi are distributions and g,γg,\gamma a smooth compactly supported functions, then (F+Φ)g=Fg+Φg(F+\Phi)\star g=F\star g+\Phi\star g and F(g+γ)=Fg+FγF\star(g+\gamma)=F\star g+F\star\gamma.

Let xRnx\in\mathbb{R}^n. First consider F(g+γ)F\star(g+\gamma) evaluated at xx. This is by definition F((g+γ)(x))F((g+\gamma)(x-\cdot)), that is, F(g(x)+γ(x))F(g(x-\cdot)+\gamma(x-\cdot)). Using the linearity of FF, we see this to be F(g(x))+F(γ(x))F(g(x-\cdot))+F(\gamma(x-\cdot)), which is by definition (Fg)(x)+(Fγ)(x)(F\star g)(x)+(F\star\gamma)(x). Thus, F(g+γ)=Fg+Fγ.F\star(g+\gamma)=F\star g+F\star \gamma.

Next, consider(F+Φ)g(F+\Phi)\star g evaluated at xx. By the definition of convolution, ((F+Φ)g)(x)=(F+Φ)(g(x))\left((F+\Phi)\star g\right)(x)=(F+\Phi)(g(x-\cdot)). Using the definition of ++ in D\mathcal{D}, we distribute as (F+Φ)(g(x))=F(g(x))+Φ(g(x))(F+\Phi)(g(x-\cdot))=F(g(x-\cdot))+\Phi(g(x-\cdot)), where the last terms are each by definition equal to (Fg)(x)(F\star g)(x) and (Φg)(x)(\Phi\star g)(x) respectively. Thus, (F+Φ)g=Fg+Φg.(F+\Phi)\star g=F\star g+\Phi\star g.

Let L1,L2L_1,L_2 be a differential operator on Rn\mathbb{R}^n such that Li(Fg)=(LiF)g=F(Lig)L_i(F\star g)=(L_iF)\star g=F\star(L_ig) for i{1,2}i\in\{1,2\} and any FDF\in\mathcal{D}, gCcg\in C_c^\infty. Then L=L1+L2L=L_1+L_2 also satisfies L(Fg)=(LF)g=FLgL(F\star g)=(LF)\star g=F\star Lg for all F,gF,g.

The differential operator L=L1+L2L=L_1+L_2 acts on functions by Lϕ=L1ϕ+L2ϕL\phi=L_1\phi+L_2\phi. Thus if FDF\in\mathcal{D}, gCcg\in C_c^\infty, (L1+L2)(Fg)=L1(Fg)+L2(Fg)(L_1+L_2)(F\star g)=L_1(F\star g)+L_2(F\star g).

To get the first of the two claimed equalities, we can use half our hypothesis on the LiL_i to re-write this as (L1F)g+(L2F)g(L_1F)\star g+(L_2 F)\star g, and then use Lemma 5 to factor out the convolution giving (L1+L2)(Fg)=(L1F+L2F)g(L_1+L_2)(F\star g)=(L_1F+L_2F)\star g. Factoring out the FF gives what we wanted: (L1+L2)(Fg)=((L1+L2)F)g(L_1+L_2)(F\star g)=((L_1+L_2)F)\star g

To get the second equality, we use the other half of our assumption on the LiL_i to rewrite L1(Fg)+L2(Fg)L_1(F\star g)+L_2(F\star g) as F(L1g)+F(L2g)F\star(L_1g)+F\star(L_2g). We use Lemma 5 to factor out the distribution from this convolution, followed by the further factoring L1g+L2g=(L1+L2)gL_1g+L_2g=(L_1+L_2)g. All together, this gives what we wanted: (L1+L2)(Fg)=F((L1+L2)g)(L_1+L_2)(F\star g)=F\star((L_1+L_2)g)

Let LL be a differential operator on Rn\mathbb{R}^n such that L(Fg)=(LF)g=F(Lg)L(F\star g)=(LF)\star g=F\star(Lg) for any FDF\in\mathcal{D} and gCcg\in C_c^\infty. Then if ψ ⁣:RnR\psi\colon\mathbb{R}^n\to R is any smooth function, the differential operator K=ψLK=\psi L defined by Kϕ=ψL(ϕ)K\phi=\psi L(\phi) also satisfies K(Fg)=(KF)g=F(Kg)K(F\star g)=(KF)\star g=F\star (Kg) for all F,gF,g.

Evaluating K(Fg)=ψL(Fg)K(F\star g)=\psi\cdot L(F\star g),we can use that LL satisfies our hypothesis to conclude this is equal to ψ((LF)g)\psi\cdot ((LF)\star g) and ψ(F(Lg))\psi\cdot(F\star(Lg)). Taking the former and evaluating at xRnx\in\mathbb{R}^n, we see it to equal ψ(x)((LF)g)(x)=ψ(x)(LF)(g(x))\psi(x)((LF)\star g)(x)=\psi(x)(LF)(g(x-\cdot)) At this fixed xx, ψ(x)\psi(x) is a constant, and so this is the same as evaluating the distribution (ψ(x)LF)(\psi(x)LF) on g(x)g(x-\cdot). But this is the definition of the convolution of ψ(x)LF\psi(x)LF with the function gg, evaluated at xx! Thus, all together ψ((LF)g)\psi\cdot ((LF)\star g) is the function x((ψ(x)LF)g)(x)x\mapsto \left((\psi(x)LF)\star g\right)(x) which is the first half of what we want: K(Fg)=ψ((LF)g)=(ψLF)g=(KF)gK(F\star g)=\psi\cdot ((LF)\star g)=(\psi LF)\star g=(KF)\star g

The other case is similar, considering ψ(F(Lg))\psi\cdot(F\star(Lg)) evaluated at xx. This yields ψ(x)F((Lg)(x))\psi(x)F((Lg)(x-\cdot)), and as ψ(x)\psi(x) is a constant at this xx, we may pull it inside to get F(ψ(x)(Lg)(x))F(\psi(x)(Lg)(x-\cdot)). That is, ψ(F(Lg))\psi\cdot(F\star(Lg)) sends xx to the result of convolving the function ψ(x)Lg\psi(x)Lg with FF, so K(Fg)=ψ(F(Lg))=F(ψLg)=F(Kg)K(F\star g)=\psi\cdot(F\star(Lg))=F\star(\psi L g)=F\star(Kg)

Finally, we need a description of the class of linear differential operators on Rn\mathbb{R}^n which is amenable to our start-small-and-build-upwards approach:

USEFUL FACT Any linear differential operator on Rn\mathbb{R}^n is a multinomial in the partial derivative operators, with coefficients in C(Rn)C^\infty(\mathbb{R}^n). All the hard work is done, now its just putting the pieces together to state the main result:

Let \nabla be any linear differential operator on Rn\mathbb{R}^n. Then (Fg)=(F)g=F(g)\nabla(F\star g)=(\nabla F)\star g=F\star(\nabla g) for all FDF\in \mathcal{D}, gCcg\in C_c^\infty.

:::proof We write \nabla as a multinomial in the partial derivatives, =[α]ψ[α][α]\nabla = \sum_{[\alpha]} \psi_{[\alpha]} \partial^{[\alpha]} where [α]=[a,b,c,][\alpha]=[a,b,c,\cdots] ranges over some finite subset of all multi-indices, [α]=xaybzc\partial^{[\alpha]}=\partial_x^a\partial_y^b\partial_z^c\cdots, and for each index ψ[α]\psi_{[\alpha]} is some smooth function RnR\mathbb{R}^n\to \mathbb{R}. But as each [α]\partial^{[\alpha]} satisfies the desired property by Corollary 4, we can apply lemmas 6 and 7 finitely many times to conclude that \nabla does as well. :::

← All posts