Spring 2026

2026-05-25 11:31:33 -04:00
commit 93bfee7eef
13 changed files with 3309 additions and 0 deletions
--- a/03-collated-results-cited.md
+++ b/03-collated-results-cited.md
@@ -0,0 +1,203 @@
+## 3.2. Architecture-Embedded Physics: Hard Constraints and Geometric Deep Learning
+
+While data- and loss-embedded physics (Section 3.1) offer a flexible means to
+encourage physical plausibility, they fundamentally rely on soft penalties. This
+reliance introduces severe optimization challenges, as the network must
+simultaneously balance data fitting with the minimization of PDE residuals.
+Architecture-embedded physics addresses these failure modes by transitioning
+from "soft" optimization penalties to "hard" structural constraints. Instead of
+relying on the loss landscape to steer the model toward physical reality, this
+paradigm directly bakes invariances, symmetries, and conservation laws into the
+network's internal topology [-].
+
+### 3.2.1. Coordinate Bias and Failure Modes of Data Augmentation
+
+In scientific domains, physical systems are defined by their geometric structure
+and inherent symmetries. For instance, the physical forces acting on a molecule
+must rotate exactly as the molecule rotates in 3D space. Standard Multi-Layer
+Perceptrons (MLPs) and Convolutional Neural Networks (CNNs) are fundamentally
+coordinate-dependent; they possess no structural awareness of Euclidean
+symmetries. Consequently, when mapping a spatial input to a physical property,
+standard architectures fail to commute with symmetry operators.
+
+To demonstrate this formally, let $\mathcal T_g$ represent a spatial
+transformation operator corresponding to a continuous group element
+$g \in SE(3)$ (such as a 3D rotation or translation). For a standard black-box
+neural network $u_\theta$, applying the physical transformation to the input
+coordinates $x$ prior to the forward pass does not yield the same result as
+applying the transformation to the network's predicted output. Mathematically,
+the operations do not commute:
+
+$$u_\theta(\mathcal T_g x, t) \ne \mathcal T_g u_\theta(x, t)$$
+
+Because this standard mapping is completely blind to the underlying symmetry
+group, the network is forced to learn the fundamental rules of geometric physics
+entirely from scratch. To compensate for this coordinate bias, classical Deep
+Learning relies heavily on data augmentation—training the model on thousands of
+artificially rotated or translated examples. However, this approach is
+computationally wasteful, limits data efficiency, and only approximates
+symmetry, leaving the model vulnerable to out-of-distribution geometric
+orientations.
+
+#### Remedy: Equivariant Tensor Networks
+
+Equivariant Tensor Networks resolve this representation failure by
+architecturally restricting the neural mapping such that its internal feature
+representations transform exactly according to the underlying symmetry group,
+such as $E(3)$ (Euclidean) or $SO(3)$ (Rotation) [-].
+
+By treating the general physical operator $\mathcal O$ as a symmetry
+operator $\mathcal T_g$, the network $u_\theta$ is structurally constrained to
+natively satisfy equivariance:
+
+$$u_\theta(\mathcal T_g x, t) = \mathcal T_g u_\theta (x, t)$$
+
+This inductive bias renders the model coordinate-blind, leading to exceptional
+data efficiency and robustness. State-of-the-art models in this category, such
+as DeepH-E3 and NequIP, leverage this exact equivariance to predict electronic
+Hamiltonians and interatomic potentials with sub-meV accuracy, outperforming
+traditional solvers while eliminating the need for geometric data
+augmentation [-].
+
+### 3.2.2. Diagnosis: Energy Drift in Long-Horizon Rollouts
+
+When standard autoregressive models (e.g., RNNs, standard Transformers) are used
+to simulate physical dynamics, they typically predict the vector field of the
+next state directly from the current state [-]. Because these
+black-box architectures possess no inherent concept of conservation laws (like
+energy, momentum, or mass), local approximation errors inevitably accumulate
+over sequential time steps.
+
+Let a physical state space be defined by its coordinates and
+momenta $x = (q, p)$. A standard network attempts to learn the time derivative
+directly:
+
+$$u_\theta = f_\theta(x, t) = dx \over dt$$
+
+When numerically integrated over $N$ discrete time steps, the predicted
+trajectory becomes:
+
+$$
+x_N = x_0 + \sum_{k=0}^{N-1} \left[ u_\theta \delta t + \epsilon_k \right]
+$$
+
+Because the learned mapping is unconstrained, its Jacobian is not guaranteed to
+be symplectic. As a result the flow is not volume-preserving in phase
+space ($\nabla f \ne 0$) and the local errors $\epsilon_k$ compound. This energy
+drift causes the simulated system to depart from the valid physical manifold,
+often resulting in non-physical behavior or numerical explosions during
+long-duration simulations [-].
+
+#### Remedy: Hamiltonian Neural Networks (HNNs)
+
+Hamiltonian Networks restructure the learning problem to strictly preserve
+physical manifolds. Instead of predicting the state vector directly, the
+architecture is designed to predict a scalar Hamiltonian (or total energy
+potential) $H(q, p)$ [-]. The actual physical state is then derived analytically
+by taking the symplectic gradient of that predicted energy surface.
+
+The model governs the system's dynamics through Hamilton's equations:
+
+$$\frac{dq}{dt} = \frac{\partial H_\theta}{\partial p}, \quad \frac{dp}{dt} = -\frac{\partial H_\theta}{\partial q}$$
+
+Because the model's final dynamical outputs are strictly derived from the
+orthogonal gradients of a single scalar field, the vector field is perfectly
+conservative by definition. This structural integration of symplectic mechanics
+guarantees energy conservation over indefinite rollout horizons, a feat that is
+nearly impossible for purely data-embedded PINN models [-].
+
+### 3.2.3. Diagnosis: High Dimensionality of Multi-Body Problems
+
+Complexity in atomic and multi-scale physics often arises from the interaction
+of multiple particles, which scales combinatorially. Standard fully connected
+networks struggle to capture these complex, higher-order interaction patterns
+from raw positional data without requiring exponentially large parameter
+counts [-].
+
+Consider a macroscopic physical property $U$ of a system containing $N$
+particles at
+coordinates ${r_n}$. A complete description requires many-body expansion:
+
+$$U(r) = U_0 + \sum_i U_1(r_i) + \sum_{i < j} U_2(r_i, r_j) + \sum_{i < j < k}
+U_3(r_i, r_j, r_k) + \cdots
+
+Where $U_n$ represents the exact $n$-body interaction term. The number of
+discrete combinations required to evaluate an expanded $n$-body interaction
+scales combinatorially as $\binom{N}{n} \sim O(N^n)$. For
+a standard MLP that flattens the input into a single $3N$-dimensional vector,
+implicitly learning these $n$-order spatial correlations requires dense weight
+matrices whose parameter counts explode exponentially with the complexity of the
+physical environment.
+
+Forcing a neural network to learn these complex interactions completely from
+scratch typically results in overparameterization, poor generalization, and a
+complete lack of physical interpretability [-].
+
+#### Remedy: Basis-Expansion Networks
+
+Rather than relying on generic weight matrices to learn multi-body physics,
+Basis-Expansion Networks limit the network’s representation space to a strict
+basis of physically proven templates $\phi_i$. By projecting the problem onto
+a mathematically complete basis set (such as the Atomic Cluster Expansion [-]),
+the neural network only needs to learn the coefficients for these basis
+functions.
+
+Treating $\mathcal O$ as a Projection Operator, the network $u_\theta$ acts as a
+weighted sum of physical basis functions:
+
+$$u_\theta = f_\theta(\mathcal O(x, t)) = \sum_i w_{i \theta} \phi_i (x, t)$$
+
+where $f_\theta$ is the learnable neural mapping and $\phi_i$ are the analytical
+basis [-].
+
+### 3.2.4. Implications and Relaxed Constraints
+
+A deep insight emerging from recent ENN literature is the realization that
+strict, hard-coded equivariance might actually be *too* restrictive for certain
+physical systems, particularly those exhibiting "broken symmetry" [-]. This
+diagnostic realization has led to the development of relaxed-symmetry
+models [-]. These architectures allow for small, learnable deviations from
+perfect mathematical equivariance, providing the structural flexibility required
+to model materials under extreme stress or in non-equilibrium states without
+completely abandoning the physical prior. Furthermore, the move toward
+unsupervised learning in differentiable solvers like AI2DFT suggests that
+variational principles of physics can ultimately serve as both the loss function
+and the architectural constraint, potentially bypassing the need for labeled
+numerical data entirely [-].
+
+| Type                     | Method                            | Venue / Year | Keyword-style contribution |
+|:-------------------------|:----------------------------------|--------------|:---------------------------|
+| Equivariant Networks     | [NequIP][1]                       |              |                            |
+|                          | [MACE][2]                         |              |                            |
+|                          | [DeepH-E3][3]                     |              |                            |
+|                          | [QHNet][4]                        |              |                            |
+| Hamiltonian Networks     | [Neural Hamiltonian Diffusion][5] |              |                            |
+|                          | [DeePMD][6]                       |              |                            |
+|                          | [HEGNN][7]                        |              |                            |
+|                          | [SEGNN][7]                        |              |                            |
+| Basis Expansion Networks | [ACE Framework (2024)][8]         |              |                            |
+|                          | [AI2DFT (2024)][9]                |              |                            |
+
+[0a]: https://pmc.ncbi.nlm.nih.gov/articles/PMC12541325/  (review of equivariant networks)
+
+[0b]: https://arxiv.org/abs/2601.04104v1 (review of equivariant networks)
+
+[0c]: https://www.oaepublish.com/articles/jmi.2025.17 (review of hamiltonian networks)
+
+[1]: https://www.nature.com/articles/s41467-022-29939-5 (NequIP)
+
+[2]: https://arxiv.org/abs/2206.07697 (MACE)
+
+[3]: https://pmc.ncbi.nlm.nih.gov/articles/PMC10199065/ (DeepH-E3)
+
+[4]: https://proceedings.mlr.press/v202/yu23i (QHNet)
+
+[5]: https://neurips.cc/virtual/2025/poster/117646 (Neural Hamiltonian Diffusion)
+
+[6]: https://dx.doi.org/10.1016/j.cpc.2018.03.016 (DeePMD)
+
+[7]: https://dx.doi.org/10.1103/PhysRevB.109.144426 (SpinGNN)
+
+[8]: https://journals.aps.org/prb/abstract/10.1103/PhysRevB.99.014104 (ACE Framework)
+
+[9]: https://arxiv.org/abs/2403.11287 (AI2DFT)