Embedding Circles
A case study in geometric dimension reduction
Notes from ongoing work with Fabian Lander.
In math, high-dimensional spaces are as accessible as low-dimensional ones, and so it’s quite common for things to end up naturally living in a high-dimensional home. But for humans they aren’t accessible at all — and even low-dimensional shapes that live in such worlds can be hard to understand.
So we naturally want to pluck such objects out of their high-dimensional homes and squish them down into the cramped three-dimensional world we live in. This isn’t always possible — many dimension-reduction techniques across data science aim only to capture a shadow of the real picture — but it’s natural to want to try.
For me this kind of thing comes up all the time, in particular right now in joint work with Aaron Abrams, Dave Bachmann, and Edmund Harriss on configuration spaces. The cases we care about there are spheres sitting in , and the geometric data we want to preserve includes both intrinsic and extrinsic information. Me and Fabian Lander hope to write some software to accomplish this. But like always, I want to start simply, and really understand things. So these notes are about the simplest analog that has all the features: given a circle in , can we find a ‘best geometric approximating’ embedding into ?
The geometry of a curve
A parametrized curve in is a smooth immersion
where is a fixed parameter circle. We do not require to be injective: dropping injectivity costs us nothing (gradient flow can pass through self-intersections without incident), while dropping the immersion condition would cost us a lot, since the unit tangent, arclength derivative, and curvature data below all become singular at points where .
Two parametrized curves and related by an orientation-preserving diffeomorphism of trace out the same path at different speeds; an unparametrized curve is the equivalence class . Within each class the arclength parametrization with is canonical up to the choice of basepoint and orientation.
The first fundamental form
Pulling the Euclidean inner product on back along gives a Riemannian metric on ,
The function is the speed, the 1-form is the arclength element, and
is the total length. As a metric on the fixed parameter circle, the function is the geometric data attached to a parametrized curve.
For an unparametrized curve nothing is left of except : any two metrics on with the same total length are isometric, so the only invariant of visible to the first fundamental form is the single real number . The intrinsic data of an unparametrized curve is one number; the intrinsic data of a parametrized curve is a function.
The second fundamental form
The intrinsic data sees as an abstract Riemannian circle and is blind to how sits inside . The extrinsic data captures the bending, and to define it cleanly we need to differentiate vector fields along .
A vector field along is a smooth section of the pullback bundle — concretely, a smooth -valued function on . The first such field is the velocity , and normalizing gives the unit tangent
The pullback of the flat Euclidean connection on differentiates such a vector field componentwise; rescaling by converts -derivatives into arclength derivatives,
The curvature vector is the arclength rate of change of the unit tangent,
Differentiating forces , so takes values in the rank- normal bundle of . This is the second fundamental form of in the standard sense: for an immersion of any source dimension, is a symmetric normal-bundle-valued tensor; for a curve, has rank one and determines the whole tensor.
The vector field along contains all the information about how bends in its ambient space. Together with it is enough to recover up to translation: integrating recovers from , and in turn is determined by via once a single value is fixed.
For an unparametrized curve the natural object is as a function of arclength on , well-defined up to the basepoint and orientation gauge.
Complete invariants: Frenet–Serret
The curvature vector has its own orientation in and is not invariant under rigid motion of the ambient space. For many purposes we want isometry-invariant extrinsic data — scalar functions built from that are unchanged by translation and rotation of .
The simplest such scalar is . But this alone is not enough to determine up to rigid motion in : a unit circle in a plane and an appropriately scaled circular helix are non-congruent space curves with the same constant . The classical resolution is the Frenet–Serret apparatus: take successive arclength derivatives of the unit tangent and Gram–Schmidt them into an orthonormal moving frame along , with . The structure equations of this frame,
involve scalar functions , the generalized curvatures. By construction (Gram–Schmidt norms), and is signed.
Fundamental theorem. A curve with everywhere is determined up to rigid motion of by the data on . For an unparametrized curve, the data as functions of arclength determines up to rigid motion.
For there is just , the signed scalar curvature; for , is the curvature and is the torsion.
So the extrinsic geometry of a curve admits two natural packagings: the curvature vector , which captures the bending up to translation, and the generalized curvatures , which capture it up to full rigid motion at the cost of a more elaborate construction.
The space of curves
The geometric story of the previous section was about individual curves. To do optimization — to look for a curve with prescribed geometric data, by varying it continuously — we need to step up a level and treat the collection of all curves as a space we can do geometry on.
Two function spaces sit in the picture, both open subsets of :
the space of embedded curves (immersions that are also injective) inside the space of immersed curves. Both inclusions are open: injectivity and the immersion condition are both stable under small perturbations. As open subsets of , both inherit the structure of an infinite-dimensional Fréchet manifold, with tangent space at any canonically identified with itself — vector fields along . We will work in .
A Fréchet space is a complete topological vector space whose topology is given by a countable family of seminorms; the standard example, and the only one we need, is with seminorms for . We need the Fréchet rather than the Banach setting because no single norm captures all derivatives of a smooth function — controlling smoothness requires the whole tower of norms at once. A Fréchet manifold is locally modeled on Fréchet spaces in the usual way (charts, smooth transition maps); the one fact we will use is that an open subset of a Fréchet space is itself a Fréchet manifold, with tangent space at every point canonically the ambient Fréchet space.
The orientation-preserving reparametrization group acts on both spaces by precomposition, , and the orbits are precisely the unparametrized curves. The action is free on : an embedded curve is injective, so forces . On it is almost free — multiply-covered curves have nontrivial stabilizer. For example, wraps the unit circle twice, and the half-rotation sends to itself. The stabilizer of a -fold cover is the cyclic group generated by .
Consequently is a principal -bundle, with the base a smooth (Fréchet) manifold of unparametrized embedded curves. For immersions, is an infinite-dimensional orbifold, with orbifold points at the multiply-covered curves.
Upstairs or downstairs? The base is what we geometrically care about — each point an honest unparametrized curve. But the total space is a function space, and that is what gives us Fourier expansion, gradient flow, autodiff, and pointwise comparison of two curves. We work in and carry the reparametrization gauge along as the price.
Tangent vectors. A tangent vector at is the derivative of a one-parameter family of curves passing through at ,
This is a vector field along — at each parameter value , a vector recording the infinitesimal displacement of the curve there. So the tangent space is , the same Fréchet space the manifold is modeled on.
Inside there is a natural splitting into tangential and normal parts,
where is parallel to the curve at each point and is perpendicular. Tangential variations are infinitesimal reparametrizations — they slide the parametrization along the existing curve without changing its image — and so generate the gauge action of on . Normal variations are honest geometric deformations that move the curve in .
Riemannian metrics
A Riemannian metric on is, as on any manifold, a smoothly-varying choice of inner product on each tangent space. Concretely it assigns to each a positive-definite symmetric bilinear form
There are several natural choices; we discuss two.
The -metric
The most direct construction is to integrate the Euclidean inner product against the parameter measure,
The integrand depends on and but the integration weight does not depend on — every tangent space carries the same inner product. So with this metric is just an open subset of the Hilbert-like space with its inner product, and is in particular flat: the Levi-Civita connection is the trivial one, geodesics are straight lines in , and tangent vectors at different points can be identified.
The -metric is not reparametrization-invariant. Under , a tangent vector at corresponds to at , and
in general. So this metric is honestly a metric on the space of parametrized curves, but does not descend to the quotient — it would assign different geometry to two reparametrizations of the same unparametrized curve.
The metric
A reparametrization-invariant alternative integrates against the arclength element,
The integration weight now depends on through , so is no longer flat in this metric. Under , the arclength element also transforms, and the change-of-variables computation gives — the metric is equivariant, and descends to a Riemannian metric on the quotient (away from orbifold points). This is the right choice if we want to do geometry on unparametrized curves: it sees only what’s intrinsic to the orbit.
Which one we use
Both metrics are honest Riemannian structures on , but they have different characters. is the geometric choice — it lifts a metric from the orbit space and treats reparametrizations as the gauge they are. is the function-space choice — it treats as an open subset of a Hilbert space and benefits from being flat.
For our problem, the loss we’ll write down in §3 is itself parametrization-dependent: it pins down a parametrization correspondence between and the candidate . So we are not actually solving for an unparametrized curve; we are solving for a parametrized one.
Derivatives
A smooth function has, at each , a differential — a linear functional on the tangent space measuring the rate of change of along an infinitesimal variation. We work this out abstractly in this section, then specialize to gradients (which require a metric) and compute one example all the way through.
The differential
Fix and . The differential of at is the linear map
The right-hand side does not depend on which family realizing we use; any with and gives the same value. The differential is defined without reference to any inner product — it is the rate of change of along an infinitesimal variation, full stop.
The same construction applied to vector- or tensor-valued functions on produces vector- or tensor-valued 1-forms, and we use for all of these. So for example has a differential , computed by taking of .
Example: total length. The total length functional , , is a smooth function on . To compute its differential at , evaluate on the family :
The integration variable and the variation parameter are independent, so pulls inside the integral as on the integrand:
where is the pointwise differential of . To compute it, square both sides of and differentiate at : the LHS gives , the RHS gives , and so
Substituting back,
This is the differential of the length functional, evaluated on any tangent vector .
Gradients
A differential is a 1-form on — a linear functional on the tangent space — and to turn it into a vector field, a tangent vector at each point pointing in the direction of steepest ascent, we need a metric. Given the -metric , the gradient is the unique tangent vector at each satisfying
The differential is metric-independent; the gradient is not. We work throughout with the -metric, so “the gradient” means the -gradient unambiguously.
Example: the gradient of total length. Continuing the example, under the -metric the gradient is determined by . Integration by parts on the parameter circle (no boundary terms, since is closed) gives
and reading off,
The gradient is the curvature vector scaled by speed and pointed inward; the gradient flow shrinks the curve along its inward normal at a rate proportional to local speed and curvature. This is the parametrization-dependent version of curve-shortening flow: the canonical reparametrization-invariant CSF is , arising from the metric instead. The factor of here is the price of working in .
Optimization
We are given a fixed and want to find a planar curve whose geometric data — first and second fundamental form — matches that of as closely as possible. To formulate this as a variational problem on we need a loss functional that quantifies mismatch, and is small precisely when the data agree.
Geometric data on a curve is data on , and most natural pieces of geometric data are themselves functions on : the speed is a real-valued function, the curvature vector is an -valued function, and so on. Comparing two curves’ geometric data therefore reduces to measuring distance between functions on .
The natural distance on is the distance,
and on -valued functions, with the inner product replacing the absolute value,
Given any function-valued geometric quantity on , comparing to via this distance gives a candidate loss . In practice we work with the squared distance,
because the square root is non-differentiable at zero — exactly the point we want to converge to. Squaring removes that singularity without changing the location of the minimum: , with equality precisely when .
Geometric loss functions
The natural pieces of geometric data on a curve are its first and second fundamental forms, and we get a separate loss for each.
The intrinsic data is the metric, given pointwise by the speed , a real-valued function on . The corresponding loss is the squared distance between speeds,
The extrinsic data is the curvature vector , but here we hit a wrinkle: takes values in while takes values in , and the two cannot be directly compared. The natural scalar to compare instead is the magnitude , which lives in regardless of the ambient — and which we recognize from §1 as the first generalized curvature in the Frenet–Serret frame, the total bendiness of the curve at each point. In the planar case alone we’d have access to the signed scalar curvature , but since is intrinsically unsigned (a magnitude in ), the comparison forces us to magnitudes on both sides; mirror-image solutions are then equivalent under , which we accept as a feature of working ambient-blind. The corresponding loss is
Each of these is a smooth function that we can use as a loss in its own right. To match both kinds of data simultaneously, we take a weighted sum,
with trading off how strongly we insist on each kind of match.
Gradient flows
With a loss on in hand, an optimization strategy is a recipe for moving through in a way that decreases . Each such strategy gives an ODE on (or on its tangent bundle, for second-order schemes). Below we describe the two most natural ones — gradient flow and gradient flow with momentum — and then specialize to the metric-matching case where the gradient is short to compute by hand.
Gradient flow
On any Riemannian manifold with smooth , the gradient flow of is the first-order ODE
The energy decreases monotonically along solutions:
with equality exactly when , i.e., at critical points of . Critical points are the equilibria of the flow.
A common refinement is gradient descent with momentum, the second-order ODE
where is the covariant derivative along the time-parametrized curve associated to the Levi-Civita connection of , and is a damping coefficient. Geometrically, covariant acceleration equals force minus damping. The damping term dissipates energy and drives trajectories toward critical points of .
In our setting and . The space is flat with respect to — an open subset of the Hilbert space — so the covariant derivative is the ordinary time derivative, and the two flows read
This is one practical advantage of working in the -metric — the momentum equation has no covariant-derivative subtleties to compute around.
Specializing to the metric
Let’s work this out concretely for the case that we want to match the intrinsic metric. The loss is then , and to run either optimization strategy we need the gradient.
Differentiating at — the integration variable and the variation parameter are independent, so pulls inside the integral —
Substituting from §2 and writing the integrand using the -valued function ,
Integration by parts on has no boundary terms, so
and reading off the integrand against ,
The gradient flow is then
and expanding the derivative using ,
Two pieces: a tangential flow proportional to the -derivative of the speed mismatch, redistributing arclength along the curve; and a normal flow proportional to the speed mismatch itself, in the direction of the curvature vector — the curve bends inward where it is too long and outward where it is too short.
To see the analytic shape of this equation, substitute and . After collecting terms,
Now the gradient flow is exposed for what it is: a PDE in two independent variables, time and parameter . The dot is , applied to the time-dependent curve as a vector-valued function of ; the primes are , applied to at fixed time. The momentum flow adds a term to the left-hand side, making it a wave-like equation rather than a heat-like one.
These are infinite-dimensional partial differential equations on , and we have no hope of solving them in closed form even in this metric-only special case — and the situation is strictly worse with the curvature term, where the gradient is fourth-order in . To make progress we replace by a finite-dimensional approximation and solve the corresponding finite-dimensional flow there. Two natural approximations follow: a polygonal mesh in §4, and a Fourier truncation in §5.
Discretization
To compute anything we need to replace the infinite-dimensional configuration space with a finite-dimensional one. The most direct way is to sample the parameter circle: pick points evenly spaced in , and represent a curve by the polygon through their images.
The space of polygons
Fix and the parameter values for (indices mod ). The space of -gons in is the open subset
the discrete analog of the immersion condition (consecutive vertices distinct, so each edge has nonzero length). is a finite-dimensional manifold of dimension , with tangent space at any equal to : a tangent vector is a tuple of variations of each vertex.
A polygon is not a smooth curve. Viewed as a parametrized map — say, the piecewise-linear interpolation of its vertices — its derivative is piecewise constant with jumps at vertices, so the polygon is not in and hence not in . The space is therefore not a submanifold of . It is a separate finite-dimensional manifold, related to via the sampling map below — not contained in it.
The sampling map is the smooth surjection
Its differential at sends a tangent vector to the tuple of evaluations — discrete tangent vectors are simply samples of continuous ones.
Discrete first and second fundamental forms
Every geometric quantity we will need on a polygon is a smooth function of the vertex positions . The discrete analogs of the geometric data from §1 — edge length, unit tangent, and curvature vector — are
is the discrete first fundamental form, encoding the polygon’s intrinsic metric one edge at a time. is the discrete curvature vector, the finite-difference analog of centered at vertex , and is the discrete extrinsic data; the denominator is the discrete analog of at the vertex, appearing here because curvature is rate-of-change-of-tangent per unit arclength. These are the building blocks of the geometric loss functions we discretize next.
Discrete metric, loss, and gradient
With sample points evenly spaced in , the parameter spacing is , and the discrete analog of is the Riemann sum . So the continuous -metric becomes, under sampling,
the standard Euclidean inner product on up to overall scale. As in the continuous case, this metric is independent of , so is flat in .
The real degrees of freedom of are the vertex positions . Any function on — including any energy we want to optimize — is ultimately a function of those positions, and any gradient is a derivative with respect to them. The discrete geometric data introduced in §4.2 (lengths , tangents , curvature vectors ) are intermediate quantities, useful for writing geometric energies in a way that matches the continuous picture, but the differentiation always reduces to the chain rule applied to the vertex coordinates.
This guides us at every step. Rather than discretize the continuous gradient term-by-term — which would force decisions about how to evaluate edge-based quantities like at vertices, and would need a separate convergence analysis — we discretize the energy and then differentiate it directly.
For our problem the discrete loss is
where are computed from the candidate and from the target . This matches the continuous loss in §3 up to an overall factor from the scaling, which we absorb into the weights . Both and are smooth functions of vertex positions on the open subset where edges have nonzero length, so is smooth on .
The differential and the gradient are now finite-dimensional objects: is determined by the standard inner product via , and computing it amounts to taking partial derivatives with respect to vertex coordinates. The gradient flow ODE
is an ordinary system of coupled ODEs in the vertex coordinates.
Specializing to springs
Look just at the loss function for the first fundamental form, meaning that we are trying to get the intrinsic metric to match our parameterized curve .
This is quadratic in the edge lengths of our polygonal approximation, minus the corresponding “true” edge lengths, coming from . Such quadratic energy functions are exactly the spring potentials of classical physics, so this represents a cycle of springs with rest lengths and spring constant . Thus, the simplest finite-dimensional version of the matching problem is a spring system: build a polygon whose edges are springs with prescribed equilibrium lengths inherited from the target curve, and let it relax.
To compute the gradient flow, start with . Differentiating with respect to a vertex variation ,
Reorganize the sum so each collects its coefficient,
Reading this off against ,
The gradient flow is then
which up to the overall constant is exactly Newton’s law for a chain of Hookean springs: each vertex is pulled by two forces, one from each adjacent edge, with a stretched edge (with ) pulling its endpoints together along the tangent and a compressed edge pushing them apart.
Specializing to the bending energy
We still need to do the analogous computation for : write the bending loss as a function of vertex positions, compute its gradient, and write down the resulting flow. TODO. Each depends on , so the gradient at vertex has contributions from — three-term sparsity instead of the bidiagonal sparsity of the spring case.
Finite-dimensionalization by Fourier
Discretization replaced by a separate finite-dimensional manifold whose elements were polygons — not in at all. A different approach is to keep working with smooth curves, but restrict to those of bounded frequency. This gives a finite-dimensional submanifold of .
The Fourier submanifold
Identify via , so a planar curve becomes a -valued function on with Fourier expansion
There is no reality condition on the because takes values in , not .
Fix and a translation gauge (centering the curve at the origin; the loss is translation-invariant, so is a flat direction we lose nothing by fixing). The Fourier truncation at level is
The coefficient tuple parametrizes , so is a -complex-dimensional (-real-dimensional) manifold. We will use these coefficients as our coordinates throughout: a point of is the tuple
which we abbreviate when the index range is clear, and every quantity we compute — the curve , its derivatives, the loss, its gradient — will be expressed as a function of .
Unlike , every element of is genuinely a smooth curve, so the inclusion
is honest. It is an embedding wherever for all — the immersion condition.
The tangent space at any consists of variations within , i.e., Fourier polynomials of the same form,
Geometric data in Fourier coordinates
Differentiation in is diagonal in Fourier coordinates: each mode is an eigenfunction of with eigenvalue . So
In our coordinates, is therefore the diagonal linear map on
and is the diagonal map . So , , and are all linear functions of the coordinate vector .
The geometric data we ultimately want — the speed and the curvature vector — are not linear or even polynomial in the , because they involve square roots and quotients. For instance,
is a Fourier polynomial in (a finite trigonometric sum) and a polynomial of degree 2 in the when expanded — but itself, the square root, is neither. Curvature is worse: has -dependent denominators throughout.
Metric, loss, and gradient
Given a curve and two tangent vectors — themselves Fourier polynomials,
— we want to evaluate the -metric
Identifying , the inner product becomes . Substituting the Fourier expansions and expanding,
Integrating over and using the orthogonality relation , only the diagonal terms survive:
Two things to read off this formula. First, the right-hand side does not depend on at all — the metric is the same on every tangent space, so is flat in , just as was. Second, up to the overall factor , this is the standard Euclidean inner product on : writing in real coordinates, , and the sum becomes the dot product on . Fourier coordinates put a curve in on the same footing as a point in Euclidean space.
The loss from §3 restricts to a smooth function on , expressible as a function of the coordinates . Its differential at is the linear map ,
and the gradient is determined by . Because the metric is Euclidean up to the factor , the gradient is the ordinary partial-derivative vector, divided by :
The gradient flow is then an ordinary system of coupled real ODEs in the coefficients,
which we can run as soon as we know how to compute the partials of with respect to the real and imaginary parts of .
Evaluating the loss numerically
Any loss we work with on has the form , where the integrand is built pointwise from and its derivatives at . For instance, the metric loss has , and the bending loss has .
We approximate the integral by the trapezoidal rule on uniform sample points :
The trapezoidal rule is exceptionally accurate for smooth periodic integrands — its error decays faster than any polynomial in , so a modest gives a very accurate integral. The reason is Fourier-analytic: the trapezoidal rule with uniform points integrates each basis function exactly for , so the only source of error comes from Fourier modes with , and these decay rapidly for smooth .
We choose so that no aliasing occurs in , , — these are Fourier polynomials of degree , and uniform samples suffice to determine each of them exactly. In practice we take a bit larger (say ) to better resolve the nonlinear combinations the integrand involves.
The procedure for evaluating at a given point is then:
- Compute the coefficient vectors , , for , , .
- Evaluate , and similarly for , , at the sample points.
- Compute the integrand from these values — this depends on the specific loss.
- Sum: .
The gradient is computed by chain rule (or automatic differentiation) through the same pipeline.
Specializing to the metric
For the metric-only case (), the loss is
with .
Following the procedure above, evaluation goes:
- Compute at each sample point.
- Compute .
- Subtract the precomputed target and square: .
- Sum: .
The gradient flow is the ODE in
which we integrate with any standard ODE method (forward Euler, RK4, etc.).
A polynomial alternative
There is one reformulation of the metric loss worth flagging, because it eliminates the quadrature error entirely. The square root is the only thing keeping from being a polynomial in . If we replace with
then since and are nonnegative, if and only if , so the minimum sets agree. And is a polynomial of degree 2 in , so is polynomial of degree 4 in the coefficients, and the integral can be computed in closed form by Parseval — no quadrature, no chain rule through pointwise nonlinearities.
This is a different loss from , with different gradient flow dynamics, but the same minima. Whether to prefer it over depends on the application.
Specializing to the bending energy
We still need to do the analogous computation for the bending loss in Fourier coordinates: write the integrand in terms of , evaluate it at the sample points via the procedure above, and compute the gradient through the chain rule. TODO. Unlike the metric case, there is no clean polynomial alternative — has -dependent denominators that no analog of removes — so quadrature is the only path here.