Digital Mathematical Notebook

Topological Data Analysis

Topological Data Analysis studies how the shape of data changes across scales. This page begins with persistence and comparison questions, then develops the simplicial and homological foundations that make those ideas precise.

Seven lectures from January 2025, organized here as one continuous page with formulas, diagrams, and lightweight interactive explanations.

Review markers appear only where a formula or label is too ambiguous in the scan to transcribe with confidence.

Lecture 1 β€’ Motivation before foundations

Why the shape of data matters

The starting question is not β€œwhat is a simplex?” but β€œwhich topological feature appears and disappears as a threshold moves?”

KeyIdea / IntuitionBox

Persistence before algebra

Begin by watching a filtering function change and asking which connected components survive for a long time. Algebra enters only after that geometric picture is clear.

DefinitionBox

Filtering viewpoint

Given a measuring function $$\varphi:E\to[0,1]$$ or more generally $$\varphi:E\to[a,b]\subseteq\mathbb{R},$$ we study the topology of its sublevel sets $$\{x\in E:\varphi(x)\le t\}$$ as the threshold \(t\) varies.

RemarkBox

Why this matters

A single threshold can be misleading. Persistence keeps the full story of when features are born and when they die, so short-lived fluctuations can be separated from robust structure.

FigureWithCaption

t_2 t_1 x varphi(x)
A filtering function acts as the lens through which topology is read: moving the horizontal threshold changes which regions of the graph have entered the sublevel set.

Persistence diagrams appear immediately

As the scalar threshold rises, new connected components can appear and existing ones can merge or become filled in. Each event is recorded by a birth-death pair. Long-lived classes sit far from the diagonal; short-lived classes stay close to it.

DefinitionBox

Birth from a scalar function

For the filtration \(E_t=\{x\in E:\varphi(x)\le t\}\), a class is born at the first threshold \(b\) where it appears in the topology of \(E_t\). In the opening pages this is first visualized for connected components.

DefinitionBox

Death from a scalar function

A class dies at the threshold \(d\) where it merges into an older class or becomes a boundary. The interval \((b,d)\) is the lifetime later encoded as a point \((b,d)\) in the diagram.

ExampleBox

Reading a component trace

Tracking connected components through threshold values \(t=0\), \(t=0.2\), and beyond leads to a persistence diagram whose horizontal axis is birth time and whose vertical axis is death time.

Natural pseudodistance

Once an admissible group of transformations is fixed, the comparison question is: how close can two measuring functions become after the best alignment?

$$ d_G(\varphi,\psi)=\min_{g\in G}\lVert \varphi-\psi\circ g\rVert_\infty $$

The first GENEO intuition

Before the full definition arrives, the essential idea is already present: operators should respect the chosen symmetry action and should not increase distances. That is the seed of the GENEO perspective developed later.

Interactive

Filtration / Persistence Explainer

t0

Lecture 2 β€’ The geometric foundations

Geometric simplicial complexes

After the persistence motivation, geometry supplies the language needed to formalize those topological changes.

DefinitionBox

Affine combination and affine hull

An affine combination of vectors \(u_0,\dots,u_k\in\mathbb{R}^n\) has the form $$\sum_{i=0}^k \lambda_i u_i\qquad\text{with}\qquad \sum_{i=0}^k \lambda_i = 1.$$ The set of all such affine combinations is the affine hull.

DefinitionBox

Affine independence

Points \(u_0,\dots,u_k\) are affinely independent when the representation of an affine combination is unique. This hypothesis matters because points need not be in general position.

DefinitionBox

Convex set

A set \(S\subseteq\mathbb{R}^n\) is convex if for every \(x_1,x_2\in S\) and every \(t\in[0,1]\) we also have $$ (1-t)x_1 + tx_2\in S. $$

DefinitionBox

Compact set

In \(\mathbb{R}^n\), compact sets are closed and bounded; equivalently, every open cover admits a finite subcover.

DefinitionBox

Convex combination

A convex combination is an affine combination with non-negative coefficients. The convex hull is built from all such combinations.

TheoremBox

k-simplex

The convex hull of \(k+1\) affinely independent points is a \(k\)-simplex. The empty set plays the role of a \((-1)\)-simplex, which becomes relevant later for reduced homology.

DefinitionBox

Geometric simplicial complex

A collection \(K\) of simplices in \(\mathbb{R}^n\) is a geometric simplicial complex when every face of every simplex lies in \(K\) and the intersection of two simplices is either empty or a common face.

ExampleBox

Body, subcomplexes, and skeletons

The body is $$ |K|=\bigcup_{\sigma\in K}\sigma. $$ A subcomplex \(L\subseteq K\) is closed under faces, while the \(j\)-skeleton keeps only the simplices of dimension at most \(j\).

Interactive

Simplicial Complex Builder

Lecture 3 β€’ From combinatorics back to geometry

Abstract simplicial complexes, geometric realization, and point clouds

The next step is to separate combinatorics from embedding. An abstract simplicial complex keeps only the face relation; geometry can be recovered later through realization.

DefinitionBox

Abstract simplicial complex

A finite collection \(\mathcal{A}\) of finite subsets of a vertex set is an abstract simplicial complex when every non-empty subset of a simplex in \(\mathcal{A}\) also belongs to \(\mathcal{A}\).

RemarkBox

What gets forgotten

Passing from a geometric complex to an abstract one forgets metric information and keeps only incidence. No distance remains in the abstract complex itself.

DefinitionBox

Isomorphism of abstract complexes

A bijection on vertices induces an isomorphism when it preserves simplices and their inverses do the same. This is the combinatorial notion of β€œsame shape”.

FigureWithCaption

abstract complex β†’ geometric realization
Geometric realization reconnects an abstract complex to an actual simplicial object in Euclidean space.

TheoremBox

Geometric realization theorem

Every finite abstract simplicial complex of dimension \(d\) can be realized as a geometric simplicial complex in \(\mathbb{R}^{2d+1}\). Planar obstructions such as \(K_5\) and \(K_{3,3}\) show why low-dimensional realizations can fail.

Nerve of a cover

If \(\mathcal{F}=\{U_i\}_{i\in I}\) is a finite family of sets, its nerve records which finite intersections are non-empty.

$$ \operatorname{Nrv}(\mathcal{F})=\{J\subseteq I:\bigcap_{j\in J}U_j\neq\varnothing\} $$

This provides the bridge from point clouds to combinatorial models: cover the points by balls and look at the nerve.

DefinitionBox

Cech complex

For a point cloud \(S\subseteq\mathbb{R}^n\) and radius \(\alpha\), the Cech complex is the nerve of the ball cover \(\{B(x,\alpha)\}_{x\in S}\).

DefinitionBox

Vietoris-Rips complex

The Vietoris-Rips complex keeps a simplex whenever all of its vertices are pairwise within the chosen threshold. It is the computationally lighter companion to Cech.

TheoremBox

Vietoris-Rips lemma

The exact inclusions are $$ \check{C}(S,\alpha)\subseteq VR(S,\alpha)\subseteq \check{C}(S,\sqrt{2}\alpha). $$ The first inclusion is easy to visualize: common intersection implies pairwise intersection. The second is the key approximation step: if all vertices are pairwise \(\alpha\)-close, then one can place a larger common ball and recover a Cech simplex at scale \(\sqrt{2}\alpha\).

ExampleBox

Collapsibility

An elementary collapse removes a simplex together with one free coface. A complex is collapsible if a sequence of elementary collapses reduces it to a point.

Interactive

Nerve / Cech / Vietoris-Rips Explainer

0.72

Lecture 4 β€’ Algebraic topology enters

Chain complexes, boundary maps, homology groups, and Betti numbers

Once simplicial complexes are available, algebraic topology counts holes through chain groups, boundaries, cycles, and the quotient that identifies genuine holes.

DefinitionBox

Chain complex

A chain complex is a sequence of vector spaces \(V_p\) together with maps \(\partial_p:V_p\to V_{p-1}\) such that $$\partial_p\circ\partial_{p+1}=0.$$

DefinitionBox

Chain groups

The \(p\)-chains \(C_p\) are formal linear combinations of \(p\)-simplices, $$ C_p(K;\mathbb{K})=\left\{\sum_i a_i\sigma_i : a_i\in\mathbb{K},\ \sigma_i\text{ a }p\text{-simplex}\right\}. $$ The first computations use \(\mathbb{Z}_2\), so adding a simplex twice cancels it.

RemarkBox

Formal linear combinations are bookkeeping devices

A chain is not a new geometric simplex, but a formal sum that lets us add, compare, and cancel simplices algebraically.

FormulaBlock

Boundary operator

$$ \partial_p\langle u_0,\dots,u_p\rangle = \sum_{i=0}^p (-1)^i \langle u_0,\dots,\widehat{u_i},\dots,u_p\rangle $$

The alternating sign is the key detail. Check it first on triangles, then extend it to the general proof that \(\partial^2=0\).

ProofToggle β€’ Why \(\partial^2=0\)

Expand \(\partial_{p-1}(\partial_p\sigma)\) and group the resulting \((p-2)\)-faces into cancelling pairs. Each face appears twice with opposite signs because omitting \(u_i\) and then \(u_j\) is the same face as omitting \(u_j\) and then \(u_i\), but with opposite parity.

DefinitionBox

Quotient space

For a subspace \(W\subseteq V\), the quotient \(V/W\) consists of equivalence classes \([v]\) where \(v\sim v'\) if \(v-v'\in W\). Homology is built exactly this way: cycles modulo boundaries.

DefinitionBox

Cycles

$$ Z_p=\ker \partial_p $$ A \(p\)-cycle has empty boundary.

DefinitionBox

Boundaries

$$ B_p=\operatorname{im}\partial_{p+1} $$ A \(p\)-boundary already bounds a higher-dimensional chain.

TheoremBox

Homology group

$$ H_p(K)=Z_p/B_p=\ker\partial_p\big/\operatorname{im}\partial_{p+1}. $$

ExampleBox

A cycle is not automatically a hole

Contrast the boundary of an empty triangle with the boundary of a filled triangle. In the first case the loop gives a non-trivial \(H_1\) class; in the second it becomes a boundary and disappears in homology.

Interactive

Homology Intuition Visual

DefinitionBox

Betti numbers

Define $$ \beta_p = \dim H_p $$ and relate \(\beta_p\) to the ranks of the boundary maps through simple dimension bookkeeping.

RemarkBox

Reduced versus classical simplicial homology

Reduced homology shifts the degree-zero picture by adjoining the empty simplex and an augmentation map in degree \(0\). In practice this replaces \(\beta_0\) with \(\widetilde{\beta}_0=\beta_0-1\), removing the always-present connected component and making formulas cleaner.

TheoremBox

Euler-Poincare

Define the Euler characteristic by simplex counts, $$ \chi(K)=\sum_{i\ge 0}(-1)^i n_i, $$ and proves the matching homological formula $$ \chi(K)=\sum_{i\ge 0}(-1)^i \beta_i. $$

ProofToggle β€’ Euler-Poincare sketch

The proof substitutes the rank-nullity identities for the boundary maps into the alternating sum of simplex counts. After the \(b_i\) terms telescope, the surviving contribution is precisely the alternating sum of Betti numbers.

TheoremBox

Elementary collapse preserves homology

If \(L\) is obtained from \(K\) by an elementary collapse, then \(H_p(L)\cong H_p(K)\) for every \(p\in\mathbb{Z}\).

Lecture 5 β€’ Symmetries and filtering functions

Groups, group actions, filtering functions, and natural pseudodistance

Comparison becomes more systematic here: first basic group language, then actions on spaces and function spaces, and finally the pseudodistance induced by admissible transformations.

DefinitionBox

Group

A set \(G\) with a binary operation is a group when it has an identity, every element has an inverse, and the operation is associative. The abelian case is especially useful.

DefinitionBox

Pseudometric

A pseudometric satisfies all the metric axioms except that \(d(x,y)=0\) need not imply \(x=y\). This flexibility is useful once symmetries identify different objects.

DefinitionBox

Group action

A left action is a map \(G\times X\to X\) satisfying \(e\cdot x=x\) and \(g_2\cdot(g_1\cdot x)=(g_2g_1)\cdot x\).

KeyIdea / IntuitionBox

Perception pair

Work not only with data, but also with the observer allowed to compare it. Concretely, use a function space \(\Phi\) together with a symmetry group \(G\) acting on it by composition: this pair organizes what differences count and what differences are ignored.

Directional filtering functions on simplicial complexes

Start with a function on the vertices and extend it to a simplex by taking the maximum value on its vertices. This turns a discrete assignment into a bona fide filtering function on the whole complex.

$$ \varphi'(\sigma)=\max\{\varphi'(u_0),\dots,\varphi'(u_m)\}\qquad\text{for }\sigma=\langle u_0,\dots,u_m\rangle. $$

Metrics induced by a function space

Several sup-type distances are useful: one on vertices, one on the symmetry group, and the ambient sup norm on the function space \(\Phi\). These metrics organize the comparison theorems.

FormulaBlock

Vertex pseudometric \(D_V\)

$$ D_V(v_1,v_2)=\sup_{\varphi\in\Phi}\lvert \varphi(v_1)-\varphi(v_2)\rvert. $$

This compares vertices only through the values seen by the admissible filtering functions.

FormulaBlock

Group pseudometric \(D_G\)

$$ D_G(g_1,g_2)=\sup_{\varphi\in\Phi}\lVert \varphi\circ g_1-\varphi\circ g_2\rVert_\infty. $$

Two symmetries are close when every admissible observer sees them as producing similar pullbacks.

FormulaBlock

Function-space metric \(D_\Phi\)

$$ D_\Phi(\varphi,\psi)=\lVert\varphi-\psi\rVert_\infty. $$

In the simplicial examples this is just the maximum difference of function values on the vertices.

TheoremBox

Natural pseudodistance revisited

For \(\varphi,\psi\in\Phi\), the natural pseudodistance measures the smallest sup-norm mismatch after composing with an admissible symmetry: $$ d_G(\varphi,\psi)=\min_{g\in G}\lVert \varphi-\psi\circ g\rVert_\infty. $$ Enlarging the group can only make this quantity smaller.

RemarkBox

Why it is a pseudodistance

Distinct functions may become indistinguishable after optimal alignment by \(G\). That is why the quantity satisfies the triangle-type properties of a metric but can vanish on genuinely different representatives of the same observed shape.

Lecture 6 β€’ Formal persistent homology

Persistent modules, persistent homology groups, persistent Betti numbers, and diagrams

Persistent homology formalizes the earlier intuition with inclusions, induced maps, and invariants indexed by pairs of threshold values.

DefinitionBox

Persistent module

A persistent module is a collection of vector spaces \(\{V_t\}_{t\in\mathbb{R}}\) with maps \(i_t^s:V_s\to V_t\) for \(s\le t\), satisfying identity and functoriality: $$ i_t^s\circ i_s^r = i_t^r,\qquad i_t^t=\mathrm{id}_{V_t}. $$

DefinitionBox

Persistent homology group

If \(K^u=\{r\in K:\varphi(r)\le u\}\subseteq K^v\), the induced map on homology yields $$ H_p^\varphi(u,v)=\operatorname{im}\, i_*^{u,v}. $$ It records the \(p\)-classes already present at \(u\) that are still alive at \(v\).

DefinitionBox

Persistent Betti number function / rank invariant

Define the rank invariant by $$ \beta_p^\varphi(u,v)=\dim H_p^\varphi(u,v). $$ It is right-continuous in both variables, non-increasing in \(v\), and non-decreasing in \(u\).

DefinitionBox

Cornerpoint multiplicity

Around a corner \((u,v)\), extract multiplicity from a small rectangle by inclusion-exclusion:

$$ \mu_p^\varphi(u,v)=\beta_p^\varphi(u+\varepsilon,v-\varepsilon)-\beta_p^\varphi(u-\varepsilon,v-\varepsilon)-\beta_p^\varphi(u+\varepsilon,v+\varepsilon)+\beta_p^\varphi(u-\varepsilon,v+\varepsilon), $$

for \(\varepsilon\) small enough that the rectangle isolates the jump. Positive multiplicity means a genuine cornerpoint is present.

RemarkBox

Discontinuities and the diagram derived from \(\beta\)

The function \(\beta_p^\varphi\) has discontinuities that propagate along horizontal and vertical directions rather than diagonally, and cornerpoints occur where those jumps meet. The persistence diagram is then reconstructed from the positive-multiplicity cornerpoints together with points at infinity for classes that never die.

ExampleBox

What the diagram keeps

The persistence diagram forgets the specific cycle representative and keeps only the lifetime of the class. In this way the rank invariant \(\beta_p^\varphi\) becomes a multiset of birth-death points.

Lecture 7 β€’ Stability and equivariant operators

Matching distance, bottleneck and Wasserstein metrics, stability, and GENEOs

The final lecture turns persistence diagrams into objects that can themselves be compared, then folds the whole story back into the GENEO framework.

Matching to the diagonal

Every point in one persistence diagram may either match a point in the other diagram or be sent to the diagonal. Matching to the diagonal is the price of deleting a short-lived class.

A bijection is therefore scored by its worst or total transport cost, depending on whether one wants the bottleneck or Wasserstein viewpoint. The diagonal acts as a reservoir for features of low persistence.

FormulaBlock

Wasserstein distance

$$ W_q(X,Y)=\inf_{\eta:X\to Y}\left[\sum_{x\in X}\lVert x-\eta(x)\rVert_\infty^q\right]^{1/q}. $$

FormulaBlock

Bottleneck distance

$$ W_\infty(X,Y)=\inf_{\eta:X\to Y}\sup_{x\in X}\lVert x-\eta(x)\rVert_\infty. $$

TheoremBox

Stability idea

The matching distance between persistence diagrams is bounded by the sup-norm discrepancy after optimal alignment, $$ d_B(\mathrm{Dgm}(\varphi),\mathrm{Dgm}(\psi)) \le d_G(\varphi,\psi). $$ Small perturbations in the data therefore create only small perturbations in the diagram.

Interactive

Diagram Distance Intuition

GENEOs

The full operator pipeline starts with a perception pair \((\Phi,G)\), another pair \((\Psi,H)\), and a homomorphism \(T:G\to H\) transporting symmetries from the source observer to the target one.

DefinitionBox

GEO

A pair \((F,T)\) is a group equivariant operator when \(T:G\to H\) is a group homomorphism and $$ F(\varphi\circ g)=F(\varphi)\circ T(g)\qquad\text{for all }\varphi\in\Phi,\ g\in G. $$

DefinitionBox

GENEO

A map \(F:\Phi\to\Psi\) is a group equivariant non-expansive operator when:

  • for every \(\varphi\in\Phi\) and \(g\in G\), $$F(\varphi\circ g)=F(\varphi)\circ T(g);$$
  • for every \(\varphi_1,\varphi_2\in\Phi\), $$D_\Psi(F(\varphi_1),F(\varphi_2))\le D_\Phi(\varphi_1,\varphi_2).$$

TheoremBox

Compactness and convexity of operator families

Equip a family of admissible GENEOs with the sup metric $$ D_{\mathrm{GENEO}}(F_1,F_2)=\sup_{\varphi\in\Phi} D_\Psi(F_1(\varphi),F_2(\varphi)). $$ Under compactness assumptions on the spaces involved, that family is itself compact; and if \((F_i,T)\) share the same equivariance map \(T\), then combinations $$ \sum_i a_i F_i \qquad\text{with}\qquad \sum_i |a_i|\le 1 $$ remain GENEOs. This is the convexity statement.

TheoremBox

Induced matching-distance stability

For a collection \(\mathcal{F}\) of GENEOs, define $$ D_{\mathcal{F},\Phi}(\varphi_1,\varphi_2)=\sup_{F\in\mathcal{F}}\lVert F(\varphi_1)-F(\varphi_2)\rVert_\infty $$ and, in degree \(k\), $$ D_{\mathrm{match}}^{\mathcal{F},k}(\varphi_1,\varphi_2)=\sup_{F\in\mathcal{F}} d_{\mathrm{match}}\bigl(\mathrm{Dgm}_k(F(\varphi_1)),\mathrm{Dgm}_k(F(\varphi_2))\bigr). $$ The resulting stability inequality is $$ D_{\mathrm{match}}^{\mathcal{F},k}\le D_{\mathcal{F},\Phi}. $$

Interactive

GENEO Intuition Section

0

Source PDF