Digital Mathematical Notebook
Topological Data Analysis
Topological Data Analysis studies how the shape of data changes across scales. This page begins with persistence and comparison questions, then develops the simplicial and homological foundations that make those ideas precise.
Review markers appear only where a formula or label is too ambiguous in the scan to transcribe with confidence.
Why the shape of data matters
The starting question is not βwhat is a simplex?β but βwhich topological feature appears and disappears as a threshold moves?β
KeyIdea / IntuitionBox
Persistence before algebraBegin by watching a filtering function change and asking which connected components survive for a long time. Algebra enters only after that geometric picture is clear.
DefinitionBox
Filtering viewpointGiven a measuring function $$\varphi:E\to[0,1]$$ or more generally $$\varphi:E\to[a,b]\subseteq\mathbb{R},$$ we study the topology of its sublevel sets $$\{x\in E:\varphi(x)\le t\}$$ as the threshold \(t\) varies.
RemarkBox
Why this mattersA single threshold can be misleading. Persistence keeps the full story of when features are born and when they die, so short-lived fluctuations can be separated from robust structure.
FigureWithCaption
Persistence diagrams appear immediately
As the scalar threshold rises, new connected components can appear and existing ones can merge or become filled in. Each event is recorded by a birth-death pair. Long-lived classes sit far from the diagonal; short-lived classes stay close to it.
DefinitionBox
Birth from a scalar functionFor the filtration \(E_t=\{x\in E:\varphi(x)\le t\}\), a class is born at the first threshold \(b\) where it appears in the topology of \(E_t\). In the opening pages this is first visualized for connected components.
DefinitionBox
Death from a scalar functionA class dies at the threshold \(d\) where it merges into an older class or becomes a boundary. The interval \((b,d)\) is the lifetime later encoded as a point \((b,d)\) in the diagram.
ExampleBox
Reading a component traceTracking connected components through threshold values \(t=0\), \(t=0.2\), and beyond leads to a persistence diagram whose horizontal axis is birth time and whose vertical axis is death time.
Natural pseudodistance
Once an admissible group of transformations is fixed, the comparison question is: how close can two measuring functions become after the best alignment?
$$ d_G(\varphi,\psi)=\min_{g\in G}\lVert \varphi-\psi\circ g\rVert_\infty $$
The first GENEO intuition
Before the full definition arrives, the essential idea is already present: operators should respect the chosen symmetry action and should not increase distances. That is the seed of the GENEO perspective developed later.
Geometric simplicial complexes
After the persistence motivation, geometry supplies the language needed to formalize those topological changes.
DefinitionBox
Affine combination and affine hullAn affine combination of vectors \(u_0,\dots,u_k\in\mathbb{R}^n\) has the form $$\sum_{i=0}^k \lambda_i u_i\qquad\text{with}\qquad \sum_{i=0}^k \lambda_i = 1.$$ The set of all such affine combinations is the affine hull.
DefinitionBox
Affine independencePoints \(u_0,\dots,u_k\) are affinely independent when the representation of an affine combination is unique. This hypothesis matters because points need not be in general position.
DefinitionBox
Convex setA set \(S\subseteq\mathbb{R}^n\) is convex if for every \(x_1,x_2\in S\) and every \(t\in[0,1]\) we also have $$ (1-t)x_1 + tx_2\in S. $$
DefinitionBox
Compact setIn \(\mathbb{R}^n\), compact sets are closed and bounded; equivalently, every open cover admits a finite subcover.
DefinitionBox
Convex combinationA convex combination is an affine combination with non-negative coefficients. The convex hull is built from all such combinations.
TheoremBox
k-simplexThe convex hull of \(k+1\) affinely independent points is a \(k\)-simplex. The empty set plays the role of a \((-1)\)-simplex, which becomes relevant later for reduced homology.
DefinitionBox
Geometric simplicial complexA collection \(K\) of simplices in \(\mathbb{R}^n\) is a geometric simplicial complex when every face of every simplex lies in \(K\) and the intersection of two simplices is either empty or a common face.
ExampleBox
Body, subcomplexes, and skeletonsThe body is $$ |K|=\bigcup_{\sigma\in K}\sigma. $$ A subcomplex \(L\subseteq K\) is closed under faces, while the \(j\)-skeleton keeps only the simplices of dimension at most \(j\).
Abstract simplicial complexes, geometric realization, and point clouds
The next step is to separate combinatorics from embedding. An abstract simplicial complex keeps only the face relation; geometry can be recovered later through realization.
DefinitionBox
Abstract simplicial complexA finite collection \(\mathcal{A}\) of finite subsets of a vertex set is an abstract simplicial complex when every non-empty subset of a simplex in \(\mathcal{A}\) also belongs to \(\mathcal{A}\).
RemarkBox
What gets forgottenPassing from a geometric complex to an abstract one forgets metric information and keeps only incidence. No distance remains in the abstract complex itself.
DefinitionBox
Isomorphism of abstract complexesA bijection on vertices induces an isomorphism when it preserves simplices and their inverses do the same. This is the combinatorial notion of βsame shapeβ.
FigureWithCaption
TheoremBox
Geometric realization theoremEvery finite abstract simplicial complex of dimension \(d\) can be realized as a geometric simplicial complex in \(\mathbb{R}^{2d+1}\). Planar obstructions such as \(K_5\) and \(K_{3,3}\) show why low-dimensional realizations can fail.
Nerve of a cover
If \(\mathcal{F}=\{U_i\}_{i\in I}\) is a finite family of sets, its nerve records which finite intersections are non-empty.
$$ \operatorname{Nrv}(\mathcal{F})=\{J\subseteq I:\bigcap_{j\in J}U_j\neq\varnothing\} $$
This provides the bridge from point clouds to combinatorial models: cover the points by balls and look at the nerve.
DefinitionBox
Cech complexFor a point cloud \(S\subseteq\mathbb{R}^n\) and radius \(\alpha\), the Cech complex is the nerve of the ball cover \(\{B(x,\alpha)\}_{x\in S}\).
DefinitionBox
Vietoris-Rips complexThe Vietoris-Rips complex keeps a simplex whenever all of its vertices are pairwise within the chosen threshold. It is the computationally lighter companion to Cech.
TheoremBox
Vietoris-Rips lemmaThe exact inclusions are $$ \check{C}(S,\alpha)\subseteq VR(S,\alpha)\subseteq \check{C}(S,\sqrt{2}\alpha). $$ The first inclusion is easy to visualize: common intersection implies pairwise intersection. The second is the key approximation step: if all vertices are pairwise \(\alpha\)-close, then one can place a larger common ball and recover a Cech simplex at scale \(\sqrt{2}\alpha\).
ExampleBox
CollapsibilityAn elementary collapse removes a simplex together with one free coface. A complex is collapsible if a sequence of elementary collapses reduces it to a point.
Chain complexes, boundary maps, homology groups, and Betti numbers
Once simplicial complexes are available, algebraic topology counts holes through chain groups, boundaries, cycles, and the quotient that identifies genuine holes.
DefinitionBox
Chain complexA chain complex is a sequence of vector spaces \(V_p\) together with maps \(\partial_p:V_p\to V_{p-1}\) such that $$\partial_p\circ\partial_{p+1}=0.$$
DefinitionBox
Chain groupsThe \(p\)-chains \(C_p\) are formal linear combinations of \(p\)-simplices, $$ C_p(K;\mathbb{K})=\left\{\sum_i a_i\sigma_i : a_i\in\mathbb{K},\ \sigma_i\text{ a }p\text{-simplex}\right\}. $$ The first computations use \(\mathbb{Z}_2\), so adding a simplex twice cancels it.
RemarkBox
Formal linear combinations are bookkeeping devicesA chain is not a new geometric simplex, but a formal sum that lets us add, compare, and cancel simplices algebraically.
FormulaBlock
Boundary operator$$ \partial_p\langle u_0,\dots,u_p\rangle = \sum_{i=0}^p (-1)^i \langle u_0,\dots,\widehat{u_i},\dots,u_p\rangle $$
The alternating sign is the key detail. Check it first on triangles, then extend it to the general proof that \(\partial^2=0\).
ProofToggle β’ Why \(\partial^2=0\)
Expand \(\partial_{p-1}(\partial_p\sigma)\) and group the resulting \((p-2)\)-faces into cancelling pairs. Each face appears twice with opposite signs because omitting \(u_i\) and then \(u_j\) is the same face as omitting \(u_j\) and then \(u_i\), but with opposite parity.
DefinitionBox
Quotient spaceFor a subspace \(W\subseteq V\), the quotient \(V/W\) consists of equivalence classes \([v]\) where \(v\sim v'\) if \(v-v'\in W\). Homology is built exactly this way: cycles modulo boundaries.
DefinitionBox
Cycles$$ Z_p=\ker \partial_p $$ A \(p\)-cycle has empty boundary.
DefinitionBox
Boundaries$$ B_p=\operatorname{im}\partial_{p+1} $$ A \(p\)-boundary already bounds a higher-dimensional chain.
TheoremBox
Homology group$$ H_p(K)=Z_p/B_p=\ker\partial_p\big/\operatorname{im}\partial_{p+1}. $$
ExampleBox
A cycle is not automatically a holeContrast the boundary of an empty triangle with the boundary of a filled triangle. In the first case the loop gives a non-trivial \(H_1\) class; in the second it becomes a boundary and disappears in homology.
DefinitionBox
Betti numbersDefine $$ \beta_p = \dim H_p $$ and relate \(\beta_p\) to the ranks of the boundary maps through simple dimension bookkeeping.
RemarkBox
Reduced versus classical simplicial homologyReduced homology shifts the degree-zero picture by adjoining the empty simplex and an augmentation map in degree \(0\). In practice this replaces \(\beta_0\) with \(\widetilde{\beta}_0=\beta_0-1\), removing the always-present connected component and making formulas cleaner.
TheoremBox
Euler-PoincareDefine the Euler characteristic by simplex counts, $$ \chi(K)=\sum_{i\ge 0}(-1)^i n_i, $$ and proves the matching homological formula $$ \chi(K)=\sum_{i\ge 0}(-1)^i \beta_i. $$
ProofToggle β’ Euler-Poincare sketch
The proof substitutes the rank-nullity identities for the boundary maps into the alternating sum of simplex counts. After the \(b_i\) terms telescope, the surviving contribution is precisely the alternating sum of Betti numbers.
TheoremBox
Elementary collapse preserves homologyIf \(L\) is obtained from \(K\) by an elementary collapse, then \(H_p(L)\cong H_p(K)\) for every \(p\in\mathbb{Z}\).
Groups, group actions, filtering functions, and natural pseudodistance
Comparison becomes more systematic here: first basic group language, then actions on spaces and function spaces, and finally the pseudodistance induced by admissible transformations.
DefinitionBox
GroupA set \(G\) with a binary operation is a group when it has an identity, every element has an inverse, and the operation is associative. The abelian case is especially useful.
DefinitionBox
PseudometricA pseudometric satisfies all the metric axioms except that \(d(x,y)=0\) need not imply \(x=y\). This flexibility is useful once symmetries identify different objects.
DefinitionBox
Group actionA left action is a map \(G\times X\to X\) satisfying \(e\cdot x=x\) and \(g_2\cdot(g_1\cdot x)=(g_2g_1)\cdot x\).
KeyIdea / IntuitionBox
Perception pairWork not only with data, but also with the observer allowed to compare it. Concretely, use a function space \(\Phi\) together with a symmetry group \(G\) acting on it by composition: this pair organizes what differences count and what differences are ignored.
Directional filtering functions on simplicial complexes
Start with a function on the vertices and extend it to a simplex by taking the maximum value on its vertices. This turns a discrete assignment into a bona fide filtering function on the whole complex.
$$ \varphi'(\sigma)=\max\{\varphi'(u_0),\dots,\varphi'(u_m)\}\qquad\text{for }\sigma=\langle u_0,\dots,u_m\rangle. $$
Metrics induced by a function space
Several sup-type distances are useful: one on vertices, one on the symmetry group, and the ambient sup norm on the function space \(\Phi\). These metrics organize the comparison theorems.
FormulaBlock
Vertex pseudometric \(D_V\)$$ D_V(v_1,v_2)=\sup_{\varphi\in\Phi}\lvert \varphi(v_1)-\varphi(v_2)\rvert. $$
This compares vertices only through the values seen by the admissible filtering functions.
FormulaBlock
Group pseudometric \(D_G\)$$ D_G(g_1,g_2)=\sup_{\varphi\in\Phi}\lVert \varphi\circ g_1-\varphi\circ g_2\rVert_\infty. $$
Two symmetries are close when every admissible observer sees them as producing similar pullbacks.
FormulaBlock
Function-space metric \(D_\Phi\)$$ D_\Phi(\varphi,\psi)=\lVert\varphi-\psi\rVert_\infty. $$
In the simplicial examples this is just the maximum difference of function values on the vertices.
TheoremBox
Natural pseudodistance revisitedFor \(\varphi,\psi\in\Phi\), the natural pseudodistance measures the smallest sup-norm mismatch after composing with an admissible symmetry: $$ d_G(\varphi,\psi)=\min_{g\in G}\lVert \varphi-\psi\circ g\rVert_\infty. $$ Enlarging the group can only make this quantity smaller.
RemarkBox
Why it is a pseudodistanceDistinct functions may become indistinguishable after optimal alignment by \(G\). That is why the quantity satisfies the triangle-type properties of a metric but can vanish on genuinely different representatives of the same observed shape.
Persistent modules, persistent homology groups, persistent Betti numbers, and diagrams
Persistent homology formalizes the earlier intuition with inclusions, induced maps, and invariants indexed by pairs of threshold values.
DefinitionBox
Persistent moduleA persistent module is a collection of vector spaces \(\{V_t\}_{t\in\mathbb{R}}\) with maps \(i_t^s:V_s\to V_t\) for \(s\le t\), satisfying identity and functoriality: $$ i_t^s\circ i_s^r = i_t^r,\qquad i_t^t=\mathrm{id}_{V_t}. $$
DefinitionBox
Persistent homology groupIf \(K^u=\{r\in K:\varphi(r)\le u\}\subseteq K^v\), the induced map on homology yields $$ H_p^\varphi(u,v)=\operatorname{im}\, i_*^{u,v}. $$ It records the \(p\)-classes already present at \(u\) that are still alive at \(v\).
DefinitionBox
Persistent Betti number function / rank invariantDefine the rank invariant by $$ \beta_p^\varphi(u,v)=\dim H_p^\varphi(u,v). $$ It is right-continuous in both variables, non-increasing in \(v\), and non-decreasing in \(u\).
DefinitionBox
Cornerpoint multiplicityAround a corner \((u,v)\), extract multiplicity from a small rectangle by inclusion-exclusion:
$$ \mu_p^\varphi(u,v)=\beta_p^\varphi(u+\varepsilon,v-\varepsilon)-\beta_p^\varphi(u-\varepsilon,v-\varepsilon)-\beta_p^\varphi(u+\varepsilon,v+\varepsilon)+\beta_p^\varphi(u-\varepsilon,v+\varepsilon), $$
for \(\varepsilon\) small enough that the rectangle isolates the jump. Positive multiplicity means a genuine cornerpoint is present.
RemarkBox
Discontinuities and the diagram derived from \(\beta\)The function \(\beta_p^\varphi\) has discontinuities that propagate along horizontal and vertical directions rather than diagonally, and cornerpoints occur where those jumps meet. The persistence diagram is then reconstructed from the positive-multiplicity cornerpoints together with points at infinity for classes that never die.
ExampleBox
What the diagram keepsThe persistence diagram forgets the specific cycle representative and keeps only the lifetime of the class. In this way the rank invariant \(\beta_p^\varphi\) becomes a multiset of birth-death points.
Matching distance, bottleneck and Wasserstein metrics, stability, and GENEOs
The final lecture turns persistence diagrams into objects that can themselves be compared, then folds the whole story back into the GENEO framework.
Matching to the diagonal
Every point in one persistence diagram may either match a point in the other diagram or be sent to the diagonal. Matching to the diagonal is the price of deleting a short-lived class.
A bijection is therefore scored by its worst or total transport cost, depending on whether one wants the bottleneck or Wasserstein viewpoint. The diagonal acts as a reservoir for features of low persistence.
FormulaBlock
Wasserstein distance$$ W_q(X,Y)=\inf_{\eta:X\to Y}\left[\sum_{x\in X}\lVert x-\eta(x)\rVert_\infty^q\right]^{1/q}. $$
FormulaBlock
Bottleneck distance$$ W_\infty(X,Y)=\inf_{\eta:X\to Y}\sup_{x\in X}\lVert x-\eta(x)\rVert_\infty. $$
TheoremBox
Stability ideaThe matching distance between persistence diagrams is bounded by the sup-norm discrepancy after optimal alignment, $$ d_B(\mathrm{Dgm}(\varphi),\mathrm{Dgm}(\psi)) \le d_G(\varphi,\psi). $$ Small perturbations in the data therefore create only small perturbations in the diagram.
GENEOs
The full operator pipeline starts with a perception pair \((\Phi,G)\), another pair \((\Psi,H)\), and a homomorphism \(T:G\to H\) transporting symmetries from the source observer to the target one.
DefinitionBox
GEOA pair \((F,T)\) is a group equivariant operator when \(T:G\to H\) is a group homomorphism and $$ F(\varphi\circ g)=F(\varphi)\circ T(g)\qquad\text{for all }\varphi\in\Phi,\ g\in G. $$
DefinitionBox
GENEOA map \(F:\Phi\to\Psi\) is a group equivariant non-expansive operator when:
- for every \(\varphi\in\Phi\) and \(g\in G\), $$F(\varphi\circ g)=F(\varphi)\circ T(g);$$
- for every \(\varphi_1,\varphi_2\in\Phi\), $$D_\Psi(F(\varphi_1),F(\varphi_2))\le D_\Phi(\varphi_1,\varphi_2).$$
TheoremBox
Compactness and convexity of operator familiesEquip a family of admissible GENEOs with the sup metric $$ D_{\mathrm{GENEO}}(F_1,F_2)=\sup_{\varphi\in\Phi} D_\Psi(F_1(\varphi),F_2(\varphi)). $$ Under compactness assumptions on the spaces involved, that family is itself compact; and if \((F_i,T)\) share the same equivariance map \(T\), then combinations $$ \sum_i a_i F_i \qquad\text{with}\qquad \sum_i |a_i|\le 1 $$ remain GENEOs. This is the convexity statement.
TheoremBox
Induced matching-distance stabilityFor a collection \(\mathcal{F}\) of GENEOs, define $$ D_{\mathcal{F},\Phi}(\varphi_1,\varphi_2)=\sup_{F\in\mathcal{F}}\lVert F(\varphi_1)-F(\varphi_2)\rVert_\infty $$ and, in degree \(k\), $$ D_{\mathrm{match}}^{\mathcal{F},k}(\varphi_1,\varphi_2)=\sup_{F\in\mathcal{F}} d_{\mathrm{match}}\bigl(\mathrm{Dgm}_k(F(\varphi_1)),\mathrm{Dgm}_k(F(\varphi_2))\bigr). $$ The resulting stability inequality is $$ D_{\mathrm{match}}^{\mathcal{F},k}\le D_{\mathcal{F},\Phi}. $$