Digital Mathematical Notebook
Geometric Deep Learning
Geometric Deep Learning studies how to build neural networks that respect the structure and symmetries of the domain on which the data live. The subject is not only about graphs. It is a unifying way to think about grids, sets, groups, graphs, manifolds, meshes, point clouds, and local coordinate systems.
Why this matters
Images live on translation-structured grids, so locality and weight sharing matter.
Atoms, bonds, and spatial geometry interact through graph and Euclidean structure.
Structure-aware models must respect relational and three-dimensional constraints.
Meshes and manifolds require intrinsic neighborhoods rather than flat-image assumptions.
Symmetry and conservation structure can be more important than raw model size.
Attention and graph structure become geometric only when the right inductive bias is specified.
The big idea
The central question is not “which architecture is most fashionable?” but “what inductive bias should the model have for this domain?”
KeyIdeaBox
Structure firstData are often not plain vectors in an anonymous Euclidean box. They may come with order, neighborhoods, adjacency, coordinates, symmetries, local charts, or conservation laws. Geometric Deep Learning studies architectures that treat that structure as first-class information.
ExampleBox
GridsImages and videos live on regular lattices, where translation locality is meaningful.
ExampleBox
Sets and graphsOrder may be irrelevant, but relations between elements matter strongly.
ExampleBox
Manifolds and meshesIntrinsic neighborhoods can differ from raw ambient Euclidean proximity.
From classical deep learning to geometric deep learning
Standard architectures already use domain structure. Geometric Deep Learning makes that design principle explicit and general.
DefinitionBox
MLPsDense layers treat coordinates almost symmetrically. They impose relatively weak structural bias beyond the choice of input representation itself.
DefinitionBox
CNNsConvolution assumes a regular grid, local neighborhoods, and translation sharing. That is already a geometric prior.
DefinitionBox
Sequence modelsRNNs and transformers for text usually treat data as ordered one-dimensional structures, often enriched by positional encodings or relative biases.
RemarkBox
Generalization by respecting the domainA model that already respects the right symmetries can learn with fewer samples, share parameters more effectively, and avoid spending capacity relearning a structure that was known in advance.
Symmetry, invariance, and equivariance
These notions should be stated carefully. They are the backbone of the whole page.
DefinitionBox
Group and group actionA group \(G\) is a set of transformations with composition, an identity, and inverses. A group action on a space \(X\) is a family of maps \(T_g : X \to X\) satisfying \(T_e = \mathrm{id}\) and \(T_{gh} = T_g \circ T_h\).
FormulaBlock
Invariance and equivariance$$ F(T_g x) = F(x) \qquad \text{(invariance)} $$
$$ F(T_g x) = S_g F(x) \qquad \text{(equivariance)} $$
Invariance discards the transformation. Equivariance tracks it through a corresponding action \(S_g\) on the output space.
ExampleBox
ExamplesClassification is often invariant: translating or rotating a cat image should not change the class. Dense prediction is often equivariant: translating the input image should translate the feature map or segmentation mask in the same way.
RemarkBox
Why the distinction mattersInvariance and equivariance are not the same. Equivariance preserves structured information about how the input moved. Invariance removes it.
Grids and CNNs as the first geometric deep learning example
CNNs are best understood as a special case: convolution exploits translation structure on regular grids through locality and weight sharing.
FormulaBlock
Discrete convolution on a grid$$ (k * x)(u) = \sum_{v \in \mathbb{Z}^2} k(v)\,x(u-v). $$
The same filter \(k\) is evaluated at every spatial location. That is what creates parameter sharing and translation equivariance.
KeyIdeaBox
LocalityEach output depends only on a small spatial neighborhood. This is a bias toward local compositional structure, not a generic property of all neural nets.
RemarkBox
CNNs are not the whole storyCNNs solve the grid case elegantly, but the same idea must be reformulated when the domain is a set, a graph, a manifold, or a space with only local coordinates.
Groups and group-equivariant networks
Moving from translations to more general groups extends the convolutional idea beyond plain images.
DefinitionBox
Translation equivariance is a special caseStandard CNNs are equivariant to the translation group acting on a grid. Group-equivariant networks replace that one symmetry with a larger family such as rotations, reflections, or other structured transformations.
ExampleBox
Rotations and discrete symmetriesIf the task is insensitive to image rotations or to the orientation of a molecule up to a known group action, enforcing equivariance can reduce sample complexity and make predictions more stable.
RemarkBox
Parameter sharing through symmetryEquivariance is not only a philosophical statement. It determines how filters, kernels, or feature fields can be shared across transformed copies of the same local pattern.
Sets and permutation symmetry
A set has elements but no preferred ordering. That symmetry must be reflected in the architecture.
DefinitionBox
Permutation invariance and equivarianceA set-level output should usually be permutation invariant. An elementwise feature map should be permutation equivariant, meaning that permuting the input elements only permutes the outputs.
TheoremBox
Deep Sets form$$ f(\{x_1,\dots,x_n\}) = \rho\!\left(\sum_{i=1}^n \phi(x_i)\right). $$
Sum, mean, or related symmetric aggregations produce invariant set summaries. The important point is the symmetry of the pooling operation, not only the nonlinearity around it.
Graphs and message passing
Graphs make relations explicit, but “graph structure” and “geometric structure” are not identical concepts.
DefinitionBox
What information a graph may carryA graph can include adjacency, node features, edge features, coordinates, edge weights, or additional geometry. Some tasks need only connectivity; others need metric or geometric data beyond adjacency.
FormulaBlock
Message passing$$ m_v^{(\ell+1)} = \mathrm{AGG}^{(\ell)}\Big(\{\psi^{(\ell)}(h_v^{(\ell)}, h_u^{(\ell)}, e_{uv}) : u \in \mathcal{N}(v)\}\Big), $$
$$ h_v^{(\ell+1)} = \phi^{(\ell)}(h_v^{(\ell)}, m_v^{(\ell+1)}). $$
Each layer exchanges information only across graph edges. Deeper stacks enlarge the effective neighborhood.
RemarkBox
Neighborhood aggregation is not the whole theoryMessage passing is the dominant graph paradigm, but it is one family inside a broader design space that also includes spectral constructions, positional features, attention variants, and higher-order models.
Spectral vs spatial graph methods
These viewpoints are related, but they are not interchangeable descriptions of exactly the same object.
FormulaBlock
Spectral viewpoint$$ g_\theta(L)x = U\, g_\theta(\Lambda)\, U^\top x, $$
where \(L = U\Lambda U^\top\) is a graph Laplacian eigendecomposition. Filters are defined through graph operators and their spectrum.
DefinitionBox
Spatial viewpointSpatial methods describe computation directly through neighborhoods and aggregation rules on the graph. Message passing is the dominant example.
RemarkBox
Why spectral methods arise naturallyGraph Laplacians are canonical operators for diffusion, smoothness, and frequency notions on graphs, so spectral models inherit a strong operator-theoretic interpretation.
RemarkBox
TradeoffsSpectral constructions connect cleanly to graph operators but can depend more strongly on the graph domain. Spatial methods are often easier to implement and scale, but their behavior depends heavily on the chosen aggregation mechanism.
Expressivity and limitations of GNNs
Expressivity claims need caveats. They depend on the architecture class and on what information is provided to the model.
TheoremBox
Standard message passing and 1-WLUnder the usual neighborhood-aggregation setup, many message-passing GNNs are at most as discriminative as the 1-dimensional Weisfeiler-Lehman graph isomorphism test. With sufficiently expressive injective multiset aggregation, one can match 1-WL, but not in general surpass it without additional structure or higher-order mechanisms.
Details • Why this is a limitation statement, not a universal impossibility result
The statement concerns a standard class of message-passing architectures with shared local aggregation. It does not mean all graph-based models are limited to 1-WL in every setting. Positional encodings, subgraph features, higher-order tensors, or geometric coordinates can change the expressive regime.
RemarkBox
Coordinates do not magically solve everythingAdding coordinates can help distinguish structures, but coordinates themselves may break permutation, rotation, or gauge symmetries unless they are used carefully.
Oversmoothing and oversquashing
These are related only in the loose sense that both can hurt deep graph models. Conceptually they are different phenomena.
DefinitionBox
OversmoothingRepeated local mixing can make node embeddings increasingly similar, eventually erasing local distinctions that the task needs.
DefinitionBox
OversquashingInformation from many distant nodes may have to pass through a narrow graph bottleneck, so too many signals are compressed into too little communication capacity.
Manifolds, meshes, and point clouds
These objects are often grouped together, but they are not the same thing.
DefinitionBox
ManifoldA manifold is an intrinsic space that locally looks Euclidean, even if it sits inside a larger ambient space in a curved way.
DefinitionBox
MeshA mesh is a discrete combinatorial representation of a geometric object, often approximating a surface by vertices, edges, and faces.
DefinitionBox
Point cloudA point cloud is a finite sample of points, usually with ambient coordinates, but not necessarily an explicit intrinsic connectivity or surface parameterization.
RemarkBox
Intrinsic versus extrinsicIntrinsic geometry depends on the object itself, such as geodesic distance on a surface. Extrinsic geometry depends on how that object sits inside a larger ambient space.
RemarkBox
Why this matters for model designIf the task is driven by intrinsic geometry, a model built only from ambient Euclidean distances may mix points that are nearby in space but far apart along the surface.
Geodesics and intrinsic geometry
On curved domains, the meaningful neighborhood notion may be geodesic rather than Euclidean.
FormulaBlock
Geodesic distance$$ d_{\mathcal{M}}(p,q) = \inf_{\gamma : p \rightsquigarrow q} \operatorname{length}(\gamma). $$
The shortest path is constrained to remain on the domain \(\mathcal{M}\). This differs from the unconstrained straight-line distance in ambient space.
Gauges and local coordinates
On manifolds and curved surfaces, there may be no globally consistent coordinate frame. Local coordinates are useful, but they introduce an ambiguity that the architecture should respect.
DefinitionBox
Gauge intuitionA local frame chooses basis directions in a tangent space. Another valid observer may choose a different local basis related by a group transformation. The underlying geometry did not change, only its coordinates did.
FormulaBlock
Schematic frame change$$ f_p \longmapsto \rho(g_p)\,f_p, $$
where \(g_p\) changes the local frame at \(p\) and \(\rho\) is a representation on the feature space. A gauge-equivariant layer should commute with this change.
Transformers through the geometric lens
Transformers fit the GDL story, but only after the domain structure is stated carefully.
FormulaBlock
Self-attention$$ h_i' = \sum_j \alpha_{ij}\,W_V h_j, \qquad \alpha_{ij} = \operatorname{softmax}_j(q_i^\top k_j + b_{ij}). $$
Attention is a flexible relational mechanism. The geometric content enters through the structure of the index set, masks, relative positional information, or other symmetry-aware biases \(b_{ij}\).
RemarkBox
Without positional informationSelf-attention over a collection of tokens is permutation equivariant as a set operator. Sequence order is not built in automatically. It is usually added through positional encodings or relative biases.
RemarkBox
When transformers are geometricOn graphs, manifolds, or molecular systems, attention becomes geometric when neighborhoods, coordinates, pairwise features, or equivariant constraints encode the right domain structure. A transformer is not automatically geometric just because it is powerful.
Applications
The value of Geometric Deep Learning appears when the domain really carries structure that the model should not ignore.
ExampleBox
Molecules and chemistryAtomic graphs, bond types, and spatial coordinates interact. Symmetry-aware relational models are central here.
ExampleBox
Protein structureLocal neighborhoods, residue graphs, and three-dimensional geometry all matter for prediction.
ExampleBox
3D vision and graphicsPoint clouds, meshes, and surfaces require intrinsic or equivariant handling beyond plain image convolutions.
ExampleBox
Recommender and relational dataGraphs capture interaction structure, but the choice of graph, features, and invariances remains a modeling decision.
ExampleBox
Scientific machine learningPhysical symmetries and conserved quantities can be encoded directly into the architecture.
ExampleBox
Language and structured sequencesAttention, order, and relational constraints can all be viewed through the same structural-bias question, provided the geometry is stated explicitly.
Practical diagnostics and design choices
The best architecture depends on the domain, the target symmetry, and the notion of locality that the task actually cares about.
RemarkBox
What is the domain?Grid, set, graph, manifold, mesh, or something hybrid? The answer already narrows the model class.
RemarkBox
Which symmetries matter?Translation, rotation, permutation, gauge, or none? Respect the ones the task truly has.
RemarkBox
What notion of locality matters?Euclidean, graph-hop, geodesic, or global? This affects the whole architecture.
RemarkBox
Do you need invariance or equivariance?Classification often wants invariance; dense or structured prediction often needs equivariance.
RemarkBox
Is adjacency enough?Some graph tasks need only connectivity. Others need coordinates, edge geometry, or operator-based notions of smoothness.
RemarkBox
How deep can information travel?Watch for oversmoothing, oversquashing, and the mismatch between task range and communication topology.
Common misconceptions
Several common claims become misleading once the geometry is stated precisely.
RemarkBox
“Geometric Deep Learning just means GNNs.”No. Graphs are central, but the framework also covers grids, sets, groups, manifolds, gauges, and more.
RemarkBox
“Equivariance and invariance are the same.”No. Invariance forgets the transformation. Equivariance propagates it into the output space.
RemarkBox
“Any graph model is automatically geometric.”No. A graph architecture can still ignore the relevant symmetries or geometry of the task.
RemarkBox
“Transformers are always geometric deep learning.”No. They become geometric only when structure, symmetry, or relational inductive bias is specified carefully.
RemarkBox
“Oversmoothing and oversquashing are the same issue.”No. One concerns homogenized features; the other concerns topological information bottlenecks.
RemarkBox
“Coordinates automatically give geometry.”No. Coordinates can be extrinsic, arbitrary, or frame-dependent. They need the right symmetry handling.
Takeaways and further reading
Geometric Deep Learning is best understood as a disciplined way to choose model bias from domain structure and symmetry.
KeyIdeaBox
SummaryClassical architectures are already geometric in special cases. CNNs encode translation equivariance on grids, Deep Sets encode permutation symmetry, message passing encodes relational locality on graphs, and manifold or gauge-aware models encode richer intrinsic structure. The practical question is always the same: which symmetries and neighborhoods should the model respect?
FurtherReadingBox
How to read furtherA good path is: broad GDL overview, then symmetry/equivariance, then graph methods and their limits, then intrinsic geometry on manifolds and gauge-dependent domains.
Geometric Deep Learning and Grids, Groups, Graphs, Geodesics, and Gauges.
Group Equivariant Convolutional Networks and Gauge Equivariant CNNs.
Deep Sets and Set Transformer.
Neural Message Passing for Quantum Chemistry, GCN, and How Powerful are Graph Neural Networks?.
Geodesic CNNs for an early intrinsic-surface perspective.
WL-style expressivity, oversquashing bottlenecks, and oversmoothing / PairNorm.