Why you can freely pass derivatives through convolutions in distribution theory.
The purpose of this note is to provide a proof of the following theorem
Convolving with a Distribution
Let F be a distribution, g a smooth compactly supported function, and ∇ some differential operator.
∇(F⋆g)=(∇F)⋆g
I’ll review (extremely rapidly!) to fix notation.
We equip the vector space Cc∞ of smooth compactly supported real valued functions on Rn with
the following topology: a sequence ϕn converges to ϕ if the ϕn and all their derivatives converge
to ϕ and all of its derivatives with respect to the norm ∣∣ϕ∣∣=∫Rn∣ϕ∣dvol. The
topological dual of D of Cc∞ is the set of distributions on Rn,
or continuous linear functionals Cc∞→R.
Every smooth function f∈Cc∞ naturally gives rise to a distribution F∈D via integration,
f↦FF(ϕ)=∫Rnfϕdvol,
but not all distributions are of this form. Important examples are given by the delta distributions: for any p∈Rn we
define δp∈D by δp(ϕ):=ϕ(p), and we write δ for δ0.
The set of distributions is closed under differentiation, where the derivative of a distribution F is defined by its action
on a function ϕ in analogy to integration by parts. When M=R this lets us define the first derivative F′ of
F by F′(ϕ):=−F(ϕ′). More generally, if ∇ is some differential
operator on Cc∞ we define ∇ on D by
∇F(ϕ):=−F(∇ϕ)
Distributions can be multiplied by smooth functions: if ψ:Rn→R is smooth and F∈D, we define
ψF to be the distribution such that (ψF)(ϕ)=F(ψϕ) for all ϕ∈Cc∞.
Distributions can also be convolved with functions in Cc∞, but this operation now yields
smooth functions rather than distributions: if F∈D and
g∈Cc∞, their convolutional product F⋆g is defined below, where g(p−⋅) is the function x↦g(p−x).
F⋆g:p↦F(g(p−⋅))
Having defined everything in the theorem, our goal is clear: we wish to show that the differentiating the function resulting from F⋆g is the same as the function that arises from convolving the distribution ∇F with g.
Why We Care
One use of this is to justify a rather clever approach to finding general solutions to PDEs.
The goal is to calculate (hopefully easier to find) distributional
solutions to a differential equation to actual, real-valued function solutions through convolution. More precisely, if ∇ is a differential
operator we say a distribution F is a fundamental solution for ∇ if ∇F=δ.
The delta distribution is important here because of its particular relationship to convolution.
For any ϕ∈Cc∞, we compute the value of δ⋆ϕ at p∈Rn a
(δ⋆ϕ)(p):=δ(ϕ(p−⋅))=ϕ(p−0)=ϕ(p)
Thus, δ⋆ϕ=ϕ, and colvolution with δ realizes the identity operator on Cc∞.
Knowing this, a fundamental solution lets us find a real solution to the differential equation ∇u=g, as follows.
Since g=δ⋆g and δ=∇F we see that g=(∇F)⋆g. But, using the claimed theorem this can be rewritten as
g=∇(F⋆g). But this is precisely the statement that the function u=F⋆g solves the equation ∇u=g, as desired! Thus, the general soltuion to our PDE is simply the convolution of the fundamental soltuion with the initial condition.
Of course, this (crucially!) relies on the main theorem of this note and recently I realized I
had completely forgotten how to prove this fact!
Luckily, shortly after Daniel O’Connor showed me how it works, and so
I want to write it down for the next time that I forget.
Proving the Theorem
To prove this, we start small and build up to the general case. The interesting part is actually this small beginning however,
the rest is just packaging.
On R, suppose F∈D and g∈Cc∞. Then F⋆g is differentiable
and (F⋆g)′=(F′)⋆g.
Here’s Daniel’s argument:
Let x∈R, h=0 and consider the difference quotient
ψh(x):=h(F⋆g)(x+h)−(F⋆g)(x)
If the limit limh→0ψh(x) exists, then F⋆g is differentiable at x.
Using the definition of F⋆g and the linearity of F, we may evaluate this as
ψh(x)=hF(g(x+h−⋅))−F(g(x−⋅))=F(hg(x+h−⋅)−g(x−⋅))Using the continuity of F, we may take the limit inside, and so
limh→0ψh(x)=F(limh→0hg(x+h−⋅)−g(x−⋅)).
The quantity inside of F attempts to assign to each p∈R the value
p↦limh→0hg(x+h−p)−g(x−p)=g′(x−p)
so in our notation, this is the function g′(x−⋅), which itself is in Cc∞ as g was.
Thus, (F⋆g)′(x) exists, and (F⋆g)′(x)=F(g′(x−⋅)).
But this new term is exactly the definition of F convolved with g′, when evaluated at x! Thus as functions, we have shown
(F⋆g)′=F⋆g′
This is half of what we want, but the rest is just a straightforward application of the definition of the distributional
derivative. By definition, F′ is the linear functional such that F′(ϕ)=−F(ϕ′) for all ϕ∈Cc∞,
so computing (F′)⋆g, we see for x∈R
(F′⋆g)(x)=F′(g(x−⋅)):=−F(g(x−⋅)′)
Where g(x−⋅)′ is the function sending p↦dpdg(x−p).
Computing this derivative with the chain rule shows g(x−⋅)′=−g′(x−⋅), and so
−F(g(x−⋅)′)=−F(−g′(x−⋅))=F(g′(x−⋅))
Stringing all this together, we see (F′⋆g)(x)=F(g′(x−⋅)), where we recognize this second
term as defining the convolution F⋆g′(x). As this equality holds for all x∈R we have equality
between functions:
F′⋆g=F⋆g′
Combining with our earlier result proves the theorem, as we have shown both (F⋆g)′ and F′⋆g are equal to F⋆g′.
Let Dk be the kth derivative operator on Cc∞(R). Then for
any F∈D and g∈Cc∞, the convolution F⋆g is k times differentiable and
Dk(F⋆g)=(DkF)⋆g=F⋆(Dkg)
:::proof We can proceed inductively using Theorem 1, as Dk=D∘Dk−1 is the k-fold composition of the first derivative operator.
:::
On Rn, let ∂x denote the directional derivative with respect to the first coordinate.
Then for any F∈D and g∈Cc∞, ∂x(F⋆g) is smooth, and
∂x(F⋆g)=(∂xF)⋆g=F⋆(∂xg).
The proof is exactly analogous to the one dimensional case in Theorem 1, so we can proceed rather quickly.
It suffices to check this equality holds at an arbitrary fixed p∈Rn, where
∂x(F⋆g)(p)=limh→0h(F⋆g)(p+he1)−(F⋆g)(p)
Evaluatiting the convolutions and using the linearity and continuity of F shows this to be
F(∂xg(p−⋅)), which is the convolution of F with ∂xg evaluated at p.
Thus,
∂x(F⋆g)=F⋆(∂xg).
The second equality again follows simply by starting the definition of ∂xF to compute
(∂xF)⋆g at p, resulting in
(∂xF)⋆g=F⋆(∂xg).
If L=∂xa∂yb∂zc⋯ is any monomial in the coordinate partial derivative operators on Rn,
then for any F∈D and g∈Cc∞,
L(F⋆g)=(LF)⋆g=F⋆(Lg)
As in corollary 2, we inductively apply Lemma 3 to each partial derivative operator which shows up in L.
To build upwards from this, its useful to stop for a second and factorize out a little argument about convolution:
If F,Φ are distributions and g,γ a smooth compactly supported functions, then
(F+Φ)⋆g=F⋆g+Φ⋆g and F⋆(g+γ)=F⋆g+F⋆γ.
Let x∈Rn.
First consider F⋆(g+γ) evaluated at x. This is by definition F((g+γ)(x−⋅)), that is,
F(g(x−⋅)+γ(x−⋅)). Using the linearity of F, we see this to be F(g(x−⋅))+F(γ(x−⋅)), which
is by definition (F⋆g)(x)+(F⋆γ)(x). Thus,
F⋆(g+γ)=F⋆g+F⋆γ.
Next, consider(F+Φ)⋆g evaluated at x. By the definition of convolution,
((F+Φ)⋆g)(x)=(F+Φ)(g(x−⋅)).
Using the definition of + in D, we distribute as
(F+Φ)(g(x−⋅))=F(g(x−⋅))+Φ(g(x−⋅)), where the last terms are each by definition equal
to (F⋆g)(x) and (Φ⋆g)(x) respectively. Thus,
(F+Φ)⋆g=F⋆g+Φ⋆g.
Let L1,L2 be a differential operator on Rn such that Li(F⋆g)=(LiF)⋆g=F⋆(Lig)
for i∈{1,2} and any F∈D, g∈Cc∞.
Then L=L1+L2 also satisfies L(F⋆g)=(LF)⋆g=F⋆Lg for all F,g.
The differential operator L=L1+L2 acts on functions by Lϕ=L1ϕ+L2ϕ. Thus if F∈D, g∈Cc∞,
(L1+L2)(F⋆g)=L1(F⋆g)+L2(F⋆g).
To get the first of the two claimed equalities, we can use half our hypothesis on the Li to re-write this as
(L1F)⋆g+(L2F)⋆g, and then use Lemma 5 to factor out the convolution giving
(L1+L2)(F⋆g)=(L1F+L2F)⋆g. Factoring out the F gives what we wanted:
(L1+L2)(F⋆g)=((L1+L2)F)⋆g
To get the second equality, we use the other half of our assumption on the Li to rewrite L1(F⋆g)+L2(F⋆g)
as F⋆(L1g)+F⋆(L2g). We use Lemma 5 to factor out the distribution from this convolution,
followed by the further factoring L1g+L2g=(L1+L2)g. All together, this gives what we wanted:
(L1+L2)(F⋆g)=F⋆((L1+L2)g)
Let L be a differential operator on Rn such that L(F⋆g)=(LF)⋆g=F⋆(Lg)
for any F∈D and g∈Cc∞.
Then if ψ:Rn→R is any smooth function, the differential operator K=ψL defined by Kϕ=ψL(ϕ)
also satisfies K(F⋆g)=(KF)⋆g=F⋆(Kg) for all F,g.
Evaluating K(F⋆g)=ψ⋅L(F⋆g),we can use that L satisfies our hypothesis to conclude this is equal to
ψ⋅((LF)⋆g) and ψ⋅(F⋆(Lg)). Taking the former and evaluating at x∈Rn, we see it to equal
ψ(x)((LF)⋆g)(x)=ψ(x)(LF)(g(x−⋅))
At this fixed x, ψ(x) is a constant, and so this is the same as evaluating the distribution (ψ(x)LF)
on g(x−⋅). But this is the definition of the convolution of ψ(x)LF with the function g, evaluated at x!
Thus, all together ψ⋅((LF)⋆g) is the function x↦((ψ(x)LF)⋆g)(x) which is the first
half of what we want:
K(F⋆g)=ψ⋅((LF)⋆g)=(ψLF)⋆g=(KF)⋆g
The other case is similar, considering ψ⋅(F⋆(Lg)) evaluated at x. This yields ψ(x)F((Lg)(x−⋅)),
and as ψ(x) is a constant at this x, we may pull it inside to get F(ψ(x)(Lg)(x−⋅)).
That is, ψ⋅(F⋆(Lg)) sends x to the result of convolving the function ψ(x)Lg with F, so
K(F⋆g)=ψ⋅(F⋆(Lg))=F⋆(ψLg)=F⋆(Kg)
Finally, we need a description of the class of linear differential operators on Rn which is amenable
to our start-small-and-build-upwards approach:
USEFUL FACT Any linear differential operator on Rn is a multinomial in the partial derivative operators,
with coefficients in C∞(Rn).
All the hard work is done, now its just putting the pieces together to state the main result:
Let ∇ be any linear differential operator on Rn. Then
∇(F⋆g)=(∇F)⋆g=F⋆(∇g) for all F∈D, g∈Cc∞.
:::proof We write ∇ as a multinomial in the partial derivatives,
∇=∑[α]ψ[α]∂[α]
where [α]=[a,b,c,⋯] ranges over some finite subset of all multi-indices,
∂[α]=∂xa∂yb∂zc⋯, and for each index ψ[α] is some
smooth function Rn→R. But as each ∂[α] satisfies the desired property by Corollary 4,
we can apply lemmas 6 and 7 finitely many times to conclude that ∇ does as well.
:::