Files
COMP-790-175/03-collated-results-cited.md
David Allemang 93bfee7eef Spring 2026
2026-05-25 11:34:56 -04:00

204 lines
10 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

## 3.2. Architecture-Embedded Physics: Hard Constraints and Geometric Deep Learning
While data- and loss-embedded physics (Section 3.1) offer a flexible means to
encourage physical plausibility, they fundamentally rely on soft penalties. This
reliance introduces severe optimization challenges, as the network must
simultaneously balance data fitting with the minimization of PDE residuals.
Architecture-embedded physics addresses these failure modes by transitioning
from "soft" optimization penalties to "hard" structural constraints. Instead of
relying on the loss landscape to steer the model toward physical reality, this
paradigm directly bakes invariances, symmetries, and conservation laws into the
network's internal topology [-].
### 3.2.1. Coordinate Bias and Failure Modes of Data Augmentation
In scientific domains, physical systems are defined by their geometric structure
and inherent symmetries. For instance, the physical forces acting on a molecule
must rotate exactly as the molecule rotates in 3D space. Standard Multi-Layer
Perceptrons (MLPs) and Convolutional Neural Networks (CNNs) are fundamentally
coordinate-dependent; they possess no structural awareness of Euclidean
symmetries. Consequently, when mapping a spatial input to a physical property,
standard architectures fail to commute with symmetry operators.
To demonstrate this formally, let $\mathcal T_g$ represent a spatial
transformation operator corresponding to a continuous group element
$g \in SE(3)$ (such as a 3D rotation or translation). For a standard black-box
neural network $u_\theta$, applying the physical transformation to the input
coordinates $x$ prior to the forward pass does not yield the same result as
applying the transformation to the network's predicted output. Mathematically,
the operations do not commute:
$$u_\theta(\mathcal T_g x, t) \ne \mathcal T_g u_\theta(x, t)$$
Because this standard mapping is completely blind to the underlying symmetry
group, the network is forced to learn the fundamental rules of geometric physics
entirely from scratch. To compensate for this coordinate bias, classical Deep
Learning relies heavily on data augmentation—training the model on thousands of
artificially rotated or translated examples. However, this approach is
computationally wasteful, limits data efficiency, and only approximates
symmetry, leaving the model vulnerable to out-of-distribution geometric
orientations.
#### Remedy: Equivariant Tensor Networks
Equivariant Tensor Networks resolve this representation failure by
architecturally restricting the neural mapping such that its internal feature
representations transform exactly according to the underlying symmetry group,
such as $E(3)$ (Euclidean) or $SO(3)$ (Rotation) [-].
By treating the general physical operator $\mathcal O$ as a symmetry
operator $\mathcal T_g$, the network $u_\theta$ is structurally constrained to
natively satisfy equivariance:
$$u_\theta(\mathcal T_g x, t) = \mathcal T_g u_\theta (x, t)$$
This inductive bias renders the model coordinate-blind, leading to exceptional
data efficiency and robustness. State-of-the-art models in this category, such
as DeepH-E3 and NequIP, leverage this exact equivariance to predict electronic
Hamiltonians and interatomic potentials with sub-meV accuracy, outperforming
traditional solvers while eliminating the need for geometric data
augmentation [-].
### 3.2.2. Diagnosis: Energy Drift in Long-Horizon Rollouts
When standard autoregressive models (e.g., RNNs, standard Transformers) are used
to simulate physical dynamics, they typically predict the vector field of the
next state directly from the current state [-]. Because these
black-box architectures possess no inherent concept of conservation laws (like
energy, momentum, or mass), local approximation errors inevitably accumulate
over sequential time steps.
Let a physical state space be defined by its coordinates and
momenta $x = (q, p)$. A standard network attempts to learn the time derivative
directly:
$$u_\theta = f_\theta(x, t) = dx \over dt$$
When numerically integrated over $N$ discrete time steps, the predicted
trajectory becomes:
$$
x_N = x_0 + \sum_{k=0}^{N-1} \left[ u_\theta \delta t + \epsilon_k \right]
$$
Because the learned mapping is unconstrained, its Jacobian is not guaranteed to
be symplectic. As a result the flow is not volume-preserving in phase
space ($\nabla f \ne 0$) and the local errors $\epsilon_k$ compound. This energy
drift causes the simulated system to depart from the valid physical manifold,
often resulting in non-physical behavior or numerical explosions during
long-duration simulations [-].
#### Remedy: Hamiltonian Neural Networks (HNNs)
Hamiltonian Networks restructure the learning problem to strictly preserve
physical manifolds. Instead of predicting the state vector directly, the
architecture is designed to predict a scalar Hamiltonian (or total energy
potential) $H(q, p)$ [-]. The actual physical state is then derived analytically
by taking the symplectic gradient of that predicted energy surface.
The model governs the system's dynamics through Hamilton's equations:
$$\frac{dq}{dt} = \frac{\partial H_\theta}{\partial p}, \quad \frac{dp}{dt} = -\frac{\partial H_\theta}{\partial q}$$
Because the model's final dynamical outputs are strictly derived from the
orthogonal gradients of a single scalar field, the vector field is perfectly
conservative by definition. This structural integration of symplectic mechanics
guarantees energy conservation over indefinite rollout horizons, a feat that is
nearly impossible for purely data-embedded PINN models [-].
### 3.2.3. Diagnosis: High Dimensionality of Multi-Body Problems
Complexity in atomic and multi-scale physics often arises from the interaction
of multiple particles, which scales combinatorially. Standard fully connected
networks struggle to capture these complex, higher-order interaction patterns
from raw positional data without requiring exponentially large parameter
counts [-].
Consider a macroscopic physical property $U$ of a system containing $N$
particles at
coordinates ${r_n}$. A complete description requires many-body expansion:
$$U(r) = U_0 + \sum_i U_1(r_i) + \sum_{i < j} U_2(r_i, r_j) + \sum_{i < j < k}
U_3(r_i, r_j, r_k) + \cdots
Where $U_n$ represents the exact $n$-body interaction term. The number of
discrete combinations required to evaluate an expanded $n$-body interaction
scales combinatorially as $\binom{N}{n} \sim O(N^n)$. For
a standard MLP that flattens the input into a single $3N$-dimensional vector,
implicitly learning these $n$-order spatial correlations requires dense weight
matrices whose parameter counts explode exponentially with the complexity of the
physical environment.
Forcing a neural network to learn these complex interactions completely from
scratch typically results in overparameterization, poor generalization, and a
complete lack of physical interpretability [-].
#### Remedy: Basis-Expansion Networks
Rather than relying on generic weight matrices to learn multi-body physics,
Basis-Expansion Networks limit the networks representation space to a strict
basis of physically proven templates $\phi_i$. By projecting the problem onto
a mathematically complete basis set (such as the Atomic Cluster Expansion [-]),
the neural network only needs to learn the coefficients for these basis
functions.
Treating $\mathcal O$ as a Projection Operator, the network $u_\theta$ acts as a
weighted sum of physical basis functions:
$$u_\theta = f_\theta(\mathcal O(x, t)) = \sum_i w_{i \theta} \phi_i (x, t)$$
where $f_\theta$ is the learnable neural mapping and $\phi_i$ are the analytical
basis [-].
### 3.2.4. Implications and Relaxed Constraints
A deep insight emerging from recent ENN literature is the realization that
strict, hard-coded equivariance might actually be *too* restrictive for certain
physical systems, particularly those exhibiting "broken symmetry" [-]. This
diagnostic realization has led to the development of relaxed-symmetry
models [-]. These architectures allow for small, learnable deviations from
perfect mathematical equivariance, providing the structural flexibility required
to model materials under extreme stress or in non-equilibrium states without
completely abandoning the physical prior. Furthermore, the move toward
unsupervised learning in differentiable solvers like AI2DFT suggests that
variational principles of physics can ultimately serve as both the loss function
and the architectural constraint, potentially bypassing the need for labeled
numerical data entirely [-].
| Type | Method | Venue / Year | Keyword-style contribution |
|:-------------------------|:----------------------------------|--------------|:---------------------------|
| Equivariant Networks | [NequIP][1] | | |
| | [MACE][2] | | |
| | [DeepH-E3][3] | | |
| | [QHNet][4] | | |
| Hamiltonian Networks | [Neural Hamiltonian Diffusion][5] | | |
| | [DeePMD][6] | | |
| | [HEGNN][7] | | |
| | [SEGNN][7] | | |
| Basis Expansion Networks | [ACE Framework (2024)][8] | | |
| | [AI2DFT (2024)][9] | | |
[0a]: https://pmc.ncbi.nlm.nih.gov/articles/PMC12541325/ (review of equivariant networks)
[0b]: https://arxiv.org/abs/2601.04104v1 (review of equivariant networks)
[0c]: https://www.oaepublish.com/articles/jmi.2025.17 (review of hamiltonian networks)
[1]: https://www.nature.com/articles/s41467-022-29939-5 (NequIP)
[2]: https://arxiv.org/abs/2206.07697 (MACE)
[3]: https://pmc.ncbi.nlm.nih.gov/articles/PMC10199065/ (DeepH-E3)
[4]: https://proceedings.mlr.press/v202/yu23i (QHNet)
[5]: https://neurips.cc/virtual/2025/poster/117646 (Neural Hamiltonian Diffusion)
[6]: https://dx.doi.org/10.1016/j.cpc.2018.03.016 (DeePMD)
[7]: https://dx.doi.org/10.1103/PhysRevB.109.144426 (SpinGNN)
[8]: https://journals.aps.org/prb/abstract/10.1103/PhysRevB.99.014104 (ACE Framework)
[9]: https://arxiv.org/abs/2403.11287 (AI2DFT)