Digital Mathematical Notebook

Geometric Deep Learning

Geometric Deep Learning studies how to build neural networks that respect the structure and symmetries of the domain on which the data live. The subject is not only about graphs. It is a unifying way to think about grids, sets, groups, graphs, manifolds, meshes, point clouds, and local coordinate systems.

This page moves from symmetry and equivariance to message passing, spectral graph methods, manifolds, geodesics, gauges, and transformers through the same structural lens.

Geometric Deep Learning asks which symmetries, neighborhoods, and structural constraints should be built into the model before training begins.

Why this matters

Vision

Images live on translation-structured grids, so locality and weight sharing matter.

Molecules

Atoms, bonds, and spatial geometry interact through graph and Euclidean structure.

Proteins

Structure-aware models must respect relational and three-dimensional constraints.

3D shapes

Meshes and manifolds require intrinsic neighborhoods rather than flat-image assumptions.

Scientific ML

Symmetry and conservation structure can be more important than raw model size.

NLP / relational data

Attention and graph structure become geometric only when the right inductive bias is specified.

Section 1 • The unifying question

The big idea

The central question is not “which architecture is most fashionable?” but “what inductive bias should the model have for this domain?”

KeyIdeaBox

Structure first

Data are often not plain vectors in an anonymous Euclidean box. They may come with order, neighborhoods, adjacency, coordinates, symmetries, local charts, or conservation laws. Geometric Deep Learning studies architectures that treat that structure as first-class information.

ExampleBox

Grids

Images and videos live on regular lattices, where translation locality is meaningful.

ExampleBox

Sets and graphs

Order may be irrelevant, but relations between elements matter strongly.

ExampleBox

Manifolds and meshes

Intrinsic neighborhoods can differ from raw ambient Euclidean proximity.

Section 2 • Classical architectures already encode geometry

From classical deep learning to geometric deep learning

Standard architectures already use domain structure. Geometric Deep Learning makes that design principle explicit and general.

DefinitionBox

MLPs

Dense layers treat coordinates almost symmetrically. They impose relatively weak structural bias beyond the choice of input representation itself.

DefinitionBox

CNNs

Convolution assumes a regular grid, local neighborhoods, and translation sharing. That is already a geometric prior.

DefinitionBox

Sequence models

RNNs and transformers for text usually treat data as ordered one-dimensional structures, often enriched by positional encodings or relative biases.

RemarkBox

Generalization by respecting the domain

A model that already respects the right symmetries can learn with fewer samples, share parameters more effectively, and avoid spending capacity relearning a structure that was known in advance.

Section 3 • The core language of the subject

Symmetry, invariance, and equivariance

These notions should be stated carefully. They are the backbone of the whole page.

DefinitionBox

Group and group action

A group $G$ is a set of transformations with composition, an identity, and inverses. A group action on a space $X$ is a family of maps $T_g : X \to X$ satisfying $T_e = \mathrm{id}$ and $T_{gh} = T_g \circ T_h$.

FormulaBlock

Invariance and equivariance

$$ F(T_g x) = F(x) \qquad \text{(invariance)} $$

$$ F(T_g x) = S_g F(x) \qquad \text{(equivariance)} $$

Invariance discards the transformation. Equivariance tracks it through a corresponding action $S_g$ on the output space.

ExampleBox

Examples

Classification is often invariant: translating or rotating a cat image should not change the class. Dense prediction is often equivariant: translating the input image should translate the feature map or segmentation mask in the same way.

RemarkBox

Why the distinction matters

Invariance and equivariance are not the same. Equivariance preserves structured information about how the input moved. Invariance removes it.

Section 4 • The first geometric success story

Grids and CNNs as the first geometric deep learning example

CNNs are best understood as a special case: convolution exploits translation structure on regular grids through locality and weight sharing.

FormulaBlock

Discrete convolution on a grid

$$ (k * x)(u) = \sum_{v \in \mathbb{Z}^2} k(v)\,x(u-v). $$

The same filter $k$ is evaluated at every spatial location. That is what creates parameter sharing and translation equivariance.

KeyIdeaBox

Locality

Each output depends only on a small spatial neighborhood. This is a bias toward local compositional structure, not a generic property of all neural nets.

RemarkBox

CNNs are not the whole story

CNNs solve the grid case elegantly, but the same idea must be reformulated when the domain is a set, a graph, a manifold, or a space with only local coordinates.

Section 5 • CNNs generalized

Groups and group-equivariant networks

Moving from translations to more general groups extends the convolutional idea beyond plain images.

DefinitionBox

Translation equivariance is a special case

Standard CNNs are equivariant to the translation group acting on a grid. Group-equivariant networks replace that one symmetry with a larger family such as rotations, reflections, or other structured transformations.

ExampleBox

Rotations and discrete symmetries

If the task is insensitive to image rotations or to the orientation of a molecule up to a known group action, enforcing equivariance can reduce sample complexity and make predictions more stable.

RemarkBox

Parameter sharing through symmetry

Equivariance is not only a philosophical statement. It determines how filters, kernels, or feature fields can be shared across transformed copies of the same local pattern.

Section 6 • Orderless domains

Sets and permutation symmetry

A set has elements but no preferred ordering. That symmetry must be reflected in the architecture.

DefinitionBox

Permutation invariance and equivariance

A set-level output should usually be permutation invariant. An elementwise feature map should be permutation equivariant, meaning that permuting the input elements only permutes the outputs.

TheoremBox

Deep Sets form

$$ f(\{x_1,\dots,x_n\}) = \rho\!\left(\sum_{i=1}^n \phi(x_i)\right). $$

Sum, mean, or related symmetric aggregations produce invariant set summaries. The important point is the symmetry of the pooling operation, not only the nonlinearity around it.

Section 7 • Relational domains

Graphs and message passing

Graphs make relations explicit, but “graph structure” and “geometric structure” are not identical concepts.

DefinitionBox

What information a graph may carry

A graph can include adjacency, node features, edge features, coordinates, edge weights, or additional geometry. Some tasks need only connectivity; others need metric or geometric data beyond adjacency.

FormulaBlock

Message passing

$$ m_v^{(\ell+1)} = \mathrm{AGG}^{(\ell)}\Big(\{\psi^{(\ell)}(h_v^{(\ell)}, h_u^{(\ell)}, e_{uv}) : u \in \mathcal{N}(v)\}\Big), $$

$$ h_v^{(\ell+1)} = \phi^{(\ell)}(h_v^{(\ell)}, m_v^{(\ell+1)}). $$

Each layer exchanges information only across graph edges. Deeper stacks enlarge the effective neighborhood.

RemarkBox

Neighborhood aggregation is not the whole theory

Message passing is the dominant graph paradigm, but it is one family inside a broader design space that also includes spectral constructions, positional features, attention variants, and higher-order models.

Section 8 • Two graph-convolution viewpoints

Spectral vs spatial graph methods

These viewpoints are related, but they are not interchangeable descriptions of exactly the same object.

FormulaBlock

Spectral viewpoint

$$ g_\theta(L)x = U\, g_\theta(\Lambda)\, U^\top x, $$

where $L = U\Lambda U^\top$ is a graph Laplacian eigendecomposition. Filters are defined through graph operators and their spectrum.

DefinitionBox

Spatial viewpoint

Spatial methods describe computation directly through neighborhoods and aggregation rules on the graph. Message passing is the dominant example.

RemarkBox

Why spectral methods arise naturally

Graph Laplacians are canonical operators for diffusion, smoothness, and frequency notions on graphs, so spectral models inherit a strong operator-theoretic interpretation.

RemarkBox

Tradeoffs

Spectral constructions connect cleanly to graph operators but can depend more strongly on the graph domain. Spatial methods are often easier to implement and scale, but their behavior depends heavily on the chosen aggregation mechanism.

Section 9 • What message passing can and cannot distinguish

Expressivity and limitations of GNNs

Expressivity claims need caveats. They depend on the architecture class and on what information is provided to the model.

TheoremBox

Standard message passing and 1-WL

Under the usual neighborhood-aggregation setup, many message-passing GNNs are at most as discriminative as the 1-dimensional Weisfeiler-Lehman graph isomorphism test. With sufficiently expressive injective multiset aggregation, one can match 1-WL, but not in general surpass it without additional structure or higher-order mechanisms.

Details • Why this is a limitation statement, not a universal impossibility result

The statement concerns a standard class of message-passing architectures with shared local aggregation. It does not mean all graph-based models are limited to 1-WL in every setting. Positional encodings, subgraph features, higher-order tensors, or geometric coordinates can change the expressive regime.

RemarkBox

Coordinates do not magically solve everything

Adding coordinates can help distinguish structures, but coordinates themselves may break permutation, rotation, or gauge symmetries unless they are used carefully.

Section 10 • Two different deep-graph pathologies

Oversmoothing and oversquashing

These are related only in the loose sense that both can hurt deep graph models. Conceptually they are different phenomena.

DefinitionBox

Oversmoothing

Repeated local mixing can make node embeddings increasingly similar, eventually erasing local distinctions that the task needs.

DefinitionBox

Oversquashing

Information from many distant nodes may have to pass through a narrow graph bottleneck, so too many signals are compressed into too little communication capacity.

Section 11 • Non-Euclidean geometric domains

Manifolds, meshes, and point clouds

These objects are often grouped together, but they are not the same thing.

DefinitionBox

Manifold

A manifold is an intrinsic space that locally looks Euclidean, even if it sits inside a larger ambient space in a curved way.

DefinitionBox

Mesh

A mesh is a discrete combinatorial representation of a geometric object, often approximating a surface by vertices, edges, and faces.

DefinitionBox

Point cloud

A point cloud is a finite sample of points, usually with ambient coordinates, but not necessarily an explicit intrinsic connectivity or surface parameterization.

RemarkBox

Intrinsic versus extrinsic

Intrinsic geometry depends on the object itself, such as geodesic distance on a surface. Extrinsic geometry depends on how that object sits inside a larger ambient space.

RemarkBox

Why this matters for model design

If the task is driven by intrinsic geometry, a model built only from ambient Euclidean distances may mix points that are nearby in space but far apart along the surface.

Section 12 • Locality beyond straight lines

Geodesics and intrinsic geometry

On curved domains, the meaningful neighborhood notion may be geodesic rather than Euclidean.

FormulaBlock

Geodesic distance

$$ d_{\mathcal{M}}(p,q) = \inf_{\gamma : p \rightsquigarrow q} \operatorname{length}(\gamma). $$

The shortest path is constrained to remain on the domain $\mathcal{M}$. This differs from the unconstrained straight-line distance in ambient space.

Section 13 • Local frames and coordinate ambiguity

Gauges and local coordinates

On manifolds and curved surfaces, there may be no globally consistent coordinate frame. Local coordinates are useful, but they introduce an ambiguity that the architecture should respect.

DefinitionBox

Gauge intuition

A local frame chooses basis directions in a tangent space. Another valid observer may choose a different local basis related by a group transformation. The underlying geometry did not change, only its coordinates did.

FormulaBlock

Schematic frame change

$$ f_p \longmapsto \rho(g_p)\,f_p, $$

where $g_p$ changes the local frame at $p$ and $\rho$ is a representation on the feature space. A gauge-equivariant layer should commute with this change.

Section 14 • Attention as a relational operator

Transformers through the geometric lens

Transformers fit the GDL story, but only after the domain structure is stated carefully.

FormulaBlock

Self-attention

$$ h_i' = \sum_j \alpha_{ij}\,W_V h_j, \qquad \alpha_{ij} = \operatorname{softmax}_j(q_i^\top k_j + b_{ij}). $$

Attention is a flexible relational mechanism. The geometric content enters through the structure of the index set, masks, relative positional information, or other symmetry-aware biases $b_{ij}$.

RemarkBox

Without positional information

Self-attention over a collection of tokens is permutation equivariant as a set operator. Sequence order is not built in automatically. It is usually added through positional encodings or relative biases.

RemarkBox

When transformers are geometric

On graphs, manifolds, or molecular systems, attention becomes geometric when neighborhoods, coordinates, pairwise features, or equivariant constraints encode the right domain structure. A transformer is not automatically geometric just because it is powerful.

Section 15 • Where the framework is useful

Applications

The value of Geometric Deep Learning appears when the domain really carries structure that the model should not ignore.

ExampleBox

Molecules and chemistry

Atomic graphs, bond types, and spatial coordinates interact. Symmetry-aware relational models are central here.

ExampleBox

Protein structure

Local neighborhoods, residue graphs, and three-dimensional geometry all matter for prediction.

ExampleBox

3D vision and graphics

Point clouds, meshes, and surfaces require intrinsic or equivariant handling beyond plain image convolutions.

ExampleBox

Recommender and relational data

Graphs capture interaction structure, but the choice of graph, features, and invariances remains a modeling decision.

ExampleBox

Scientific machine learning

Physical symmetries and conserved quantities can be encoded directly into the architecture.

ExampleBox

Language and structured sequences

Attention, order, and relational constraints can all be viewed through the same structural-bias question, provided the geometry is stated explicitly.

Section 16 • How to choose a model family

Practical diagnostics and design choices

The best architecture depends on the domain, the target symmetry, and the notion of locality that the task actually cares about.

RemarkBox

What is the domain?

Grid, set, graph, manifold, mesh, or something hybrid? The answer already narrows the model class.

RemarkBox

Which symmetries matter?

Translation, rotation, permutation, gauge, or none? Respect the ones the task truly has.

RemarkBox

What notion of locality matters?

Euclidean, graph-hop, geodesic, or global? This affects the whole architecture.

RemarkBox

Do you need invariance or equivariance?

Classification often wants invariance; dense or structured prediction often needs equivariance.

RemarkBox

Is adjacency enough?

Some graph tasks need only connectivity. Others need coordinates, edge geometry, or operator-based notions of smoothness.

RemarkBox

How deep can information travel?

Watch for oversmoothing, oversquashing, and the mismatch between task range and communication topology.

Section 17 • Short corrections

Common misconceptions

Several common claims become misleading once the geometry is stated precisely.

RemarkBox

“Geometric Deep Learning just means GNNs.”

No. Graphs are central, but the framework also covers grids, sets, groups, manifolds, gauges, and more.

RemarkBox

“Equivariance and invariance are the same.”

No. Invariance forgets the transformation. Equivariance propagates it into the output space.

RemarkBox

“Any graph model is automatically geometric.”

No. A graph architecture can still ignore the relevant symmetries or geometry of the task.

RemarkBox

“Transformers are always geometric deep learning.”

No. They become geometric only when structure, symmetry, or relational inductive bias is specified carefully.

RemarkBox

“Oversmoothing and oversquashing are the same issue.”

No. One concerns homogenized features; the other concerns topological information bottlenecks.

RemarkBox

“Coordinates automatically give geometry.”

No. Coordinates can be extrinsic, arbitrary, or frame-dependent. They need the right symmetry handling.

Section 18 • Closing map

Takeaways and further reading

Geometric Deep Learning is best understood as a disciplined way to choose model bias from domain structure and symmetry.

KeyIdeaBox

Summary

Classical architectures are already geometric in special cases. CNNs encode translation equivariance on grids, Deep Sets encode permutation symmetry, message passing encodes relational locality on graphs, and manifold or gauge-aware models encode richer intrinsic structure. The practical question is always the same: which symmetries and neighborhoods should the model respect?

FurtherReadingBox

How to read further

A good path is: broad GDL overview, then symmetry/equivariance, then graph methods and their limits, then intrinsic geometry on manifolds and gauge-dependent domains.

Gentle overview

Geometric Deep Learning and Grids, Groups, Graphs, Geodesics, and Gauges.

Symmetry / equivariance

Group Equivariant Convolutional Networks and Gauge Equivariant CNNs.

Sets and permutation symmetry

Deep Sets and Set Transformer.

Graphs and message passing

Neural Message Passing for Quantum Chemistry, GCN, and How Powerful are Graph Neural Networks?.

Manifolds / intrinsic geometry

Geodesic CNNs for an early intrinsic-surface perspective.

Theory / limitations

WL-style expressivity, oversquashing bottlenecks, and oversmoothing / PairNorm.

Geometric Deep Learning

Geometric Deep Learning

The big idea

From classical deep learning to geometric deep learning

Symmetry, invariance, and equivariance

Symmetry / equivariance demo

Grids and CNNs as the first geometric deep learning example

CNN-on-grid intuition

Groups and group-equivariant networks

Sets and permutation symmetry

Set invariance demo

Graphs and message passing

Graph message passing demo

Spectral vs spatial graph methods

Expressivity and limitations of GNNs

Oversmoothing and oversquashing

Oversmoothing demo

Oversquashing demo

Manifolds, meshes, and point clouds

Geodesics and intrinsic geometry

Geodesic vs Euclidean neighborhood

Gauges and local coordinates

Gauge / local-frame intuition

Transformers through the geometric lens

Applications

Practical diagnostics and design choices

Common misconceptions

Takeaways and further reading