Spring 2026
This commit is contained in:
203
03-collated-results-cited.md
Normal file
203
03-collated-results-cited.md
Normal file
@@ -0,0 +1,203 @@
|
||||
## 3.2. Architecture-Embedded Physics: Hard Constraints and Geometric Deep Learning
|
||||
|
||||
While data- and loss-embedded physics (Section 3.1) offer a flexible means to
|
||||
encourage physical plausibility, they fundamentally rely on soft penalties. This
|
||||
reliance introduces severe optimization challenges, as the network must
|
||||
simultaneously balance data fitting with the minimization of PDE residuals.
|
||||
Architecture-embedded physics addresses these failure modes by transitioning
|
||||
from "soft" optimization penalties to "hard" structural constraints. Instead of
|
||||
relying on the loss landscape to steer the model toward physical reality, this
|
||||
paradigm directly bakes invariances, symmetries, and conservation laws into the
|
||||
network's internal topology [-].
|
||||
|
||||
### 3.2.1. Coordinate Bias and Failure Modes of Data Augmentation
|
||||
|
||||
In scientific domains, physical systems are defined by their geometric structure
|
||||
and inherent symmetries. For instance, the physical forces acting on a molecule
|
||||
must rotate exactly as the molecule rotates in 3D space. Standard Multi-Layer
|
||||
Perceptrons (MLPs) and Convolutional Neural Networks (CNNs) are fundamentally
|
||||
coordinate-dependent; they possess no structural awareness of Euclidean
|
||||
symmetries. Consequently, when mapping a spatial input to a physical property,
|
||||
standard architectures fail to commute with symmetry operators.
|
||||
|
||||
To demonstrate this formally, let $\mathcal T_g$ represent a spatial
|
||||
transformation operator corresponding to a continuous group element
|
||||
$g \in SE(3)$ (such as a 3D rotation or translation). For a standard black-box
|
||||
neural network $u_\theta$, applying the physical transformation to the input
|
||||
coordinates $x$ prior to the forward pass does not yield the same result as
|
||||
applying the transformation to the network's predicted output. Mathematically,
|
||||
the operations do not commute:
|
||||
|
||||
$$u_\theta(\mathcal T_g x, t) \ne \mathcal T_g u_\theta(x, t)$$
|
||||
|
||||
Because this standard mapping is completely blind to the underlying symmetry
|
||||
group, the network is forced to learn the fundamental rules of geometric physics
|
||||
entirely from scratch. To compensate for this coordinate bias, classical Deep
|
||||
Learning relies heavily on data augmentation—training the model on thousands of
|
||||
artificially rotated or translated examples. However, this approach is
|
||||
computationally wasteful, limits data efficiency, and only approximates
|
||||
symmetry, leaving the model vulnerable to out-of-distribution geometric
|
||||
orientations.
|
||||
|
||||
#### Remedy: Equivariant Tensor Networks
|
||||
|
||||
Equivariant Tensor Networks resolve this representation failure by
|
||||
architecturally restricting the neural mapping such that its internal feature
|
||||
representations transform exactly according to the underlying symmetry group,
|
||||
such as $E(3)$ (Euclidean) or $SO(3)$ (Rotation) [-].
|
||||
|
||||
By treating the general physical operator $\mathcal O$ as a symmetry
|
||||
operator $\mathcal T_g$, the network $u_\theta$ is structurally constrained to
|
||||
natively satisfy equivariance:
|
||||
|
||||
$$u_\theta(\mathcal T_g x, t) = \mathcal T_g u_\theta (x, t)$$
|
||||
|
||||
This inductive bias renders the model coordinate-blind, leading to exceptional
|
||||
data efficiency and robustness. State-of-the-art models in this category, such
|
||||
as DeepH-E3 and NequIP, leverage this exact equivariance to predict electronic
|
||||
Hamiltonians and interatomic potentials with sub-meV accuracy, outperforming
|
||||
traditional solvers while eliminating the need for geometric data
|
||||
augmentation [-].
|
||||
|
||||
### 3.2.2. Diagnosis: Energy Drift in Long-Horizon Rollouts
|
||||
|
||||
When standard autoregressive models (e.g., RNNs, standard Transformers) are used
|
||||
to simulate physical dynamics, they typically predict the vector field of the
|
||||
next state directly from the current state [-]. Because these
|
||||
black-box architectures possess no inherent concept of conservation laws (like
|
||||
energy, momentum, or mass), local approximation errors inevitably accumulate
|
||||
over sequential time steps.
|
||||
|
||||
Let a physical state space be defined by its coordinates and
|
||||
momenta $x = (q, p)$. A standard network attempts to learn the time derivative
|
||||
directly:
|
||||
|
||||
$$u_\theta = f_\theta(x, t) = dx \over dt$$
|
||||
|
||||
When numerically integrated over $N$ discrete time steps, the predicted
|
||||
trajectory becomes:
|
||||
|
||||
$$
|
||||
x_N = x_0 + \sum_{k=0}^{N-1} \left[ u_\theta \delta t + \epsilon_k \right]
|
||||
$$
|
||||
|
||||
Because the learned mapping is unconstrained, its Jacobian is not guaranteed to
|
||||
be symplectic. As a result the flow is not volume-preserving in phase
|
||||
space ($\nabla f \ne 0$) and the local errors $\epsilon_k$ compound. This energy
|
||||
drift causes the simulated system to depart from the valid physical manifold,
|
||||
often resulting in non-physical behavior or numerical explosions during
|
||||
long-duration simulations [-].
|
||||
|
||||
#### Remedy: Hamiltonian Neural Networks (HNNs)
|
||||
|
||||
Hamiltonian Networks restructure the learning problem to strictly preserve
|
||||
physical manifolds. Instead of predicting the state vector directly, the
|
||||
architecture is designed to predict a scalar Hamiltonian (or total energy
|
||||
potential) $H(q, p)$ [-]. The actual physical state is then derived analytically
|
||||
by taking the symplectic gradient of that predicted energy surface.
|
||||
|
||||
The model governs the system's dynamics through Hamilton's equations:
|
||||
|
||||
$$\frac{dq}{dt} = \frac{\partial H_\theta}{\partial p}, \quad \frac{dp}{dt} = -\frac{\partial H_\theta}{\partial q}$$
|
||||
|
||||
Because the model's final dynamical outputs are strictly derived from the
|
||||
orthogonal gradients of a single scalar field, the vector field is perfectly
|
||||
conservative by definition. This structural integration of symplectic mechanics
|
||||
guarantees energy conservation over indefinite rollout horizons, a feat that is
|
||||
nearly impossible for purely data-embedded PINN models [-].
|
||||
|
||||
### 3.2.3. Diagnosis: High Dimensionality of Multi-Body Problems
|
||||
|
||||
Complexity in atomic and multi-scale physics often arises from the interaction
|
||||
of multiple particles, which scales combinatorially. Standard fully connected
|
||||
networks struggle to capture these complex, higher-order interaction patterns
|
||||
from raw positional data without requiring exponentially large parameter
|
||||
counts [-].
|
||||
|
||||
Consider a macroscopic physical property $U$ of a system containing $N$
|
||||
particles at
|
||||
coordinates ${r_n}$. A complete description requires many-body expansion:
|
||||
|
||||
$$U(r) = U_0 + \sum_i U_1(r_i) + \sum_{i < j} U_2(r_i, r_j) + \sum_{i < j < k}
|
||||
U_3(r_i, r_j, r_k) + \cdots
|
||||
|
||||
Where $U_n$ represents the exact $n$-body interaction term. The number of
|
||||
discrete combinations required to evaluate an expanded $n$-body interaction
|
||||
scales combinatorially as $\binom{N}{n} \sim O(N^n)$. For
|
||||
a standard MLP that flattens the input into a single $3N$-dimensional vector,
|
||||
implicitly learning these $n$-order spatial correlations requires dense weight
|
||||
matrices whose parameter counts explode exponentially with the complexity of the
|
||||
physical environment.
|
||||
|
||||
Forcing a neural network to learn these complex interactions completely from
|
||||
scratch typically results in overparameterization, poor generalization, and a
|
||||
complete lack of physical interpretability [-].
|
||||
|
||||
#### Remedy: Basis-Expansion Networks
|
||||
|
||||
Rather than relying on generic weight matrices to learn multi-body physics,
|
||||
Basis-Expansion Networks limit the network’s representation space to a strict
|
||||
basis of physically proven templates $\phi_i$. By projecting the problem onto
|
||||
a mathematically complete basis set (such as the Atomic Cluster Expansion [-]),
|
||||
the neural network only needs to learn the coefficients for these basis
|
||||
functions.
|
||||
|
||||
Treating $\mathcal O$ as a Projection Operator, the network $u_\theta$ acts as a
|
||||
weighted sum of physical basis functions:
|
||||
|
||||
$$u_\theta = f_\theta(\mathcal O(x, t)) = \sum_i w_{i \theta} \phi_i (x, t)$$
|
||||
|
||||
where $f_\theta$ is the learnable neural mapping and $\phi_i$ are the analytical
|
||||
basis [-].
|
||||
|
||||
### 3.2.4. Implications and Relaxed Constraints
|
||||
|
||||
A deep insight emerging from recent ENN literature is the realization that
|
||||
strict, hard-coded equivariance might actually be *too* restrictive for certain
|
||||
physical systems, particularly those exhibiting "broken symmetry" [-]. This
|
||||
diagnostic realization has led to the development of relaxed-symmetry
|
||||
models [-]. These architectures allow for small, learnable deviations from
|
||||
perfect mathematical equivariance, providing the structural flexibility required
|
||||
to model materials under extreme stress or in non-equilibrium states without
|
||||
completely abandoning the physical prior. Furthermore, the move toward
|
||||
unsupervised learning in differentiable solvers like AI2DFT suggests that
|
||||
variational principles of physics can ultimately serve as both the loss function
|
||||
and the architectural constraint, potentially bypassing the need for labeled
|
||||
numerical data entirely [-].
|
||||
|
||||
| Type | Method | Venue / Year | Keyword-style contribution |
|
||||
|:-------------------------|:----------------------------------|--------------|:---------------------------|
|
||||
| Equivariant Networks | [NequIP][1] | | |
|
||||
| | [MACE][2] | | |
|
||||
| | [DeepH-E3][3] | | |
|
||||
| | [QHNet][4] | | |
|
||||
| Hamiltonian Networks | [Neural Hamiltonian Diffusion][5] | | |
|
||||
| | [DeePMD][6] | | |
|
||||
| | [HEGNN][7] | | |
|
||||
| | [SEGNN][7] | | |
|
||||
| Basis Expansion Networks | [ACE Framework (2024)][8] | | |
|
||||
| | [AI2DFT (2024)][9] | | |
|
||||
|
||||
[0a]: https://pmc.ncbi.nlm.nih.gov/articles/PMC12541325/ (review of equivariant networks)
|
||||
|
||||
[0b]: https://arxiv.org/abs/2601.04104v1 (review of equivariant networks)
|
||||
|
||||
[0c]: https://www.oaepublish.com/articles/jmi.2025.17 (review of hamiltonian networks)
|
||||
|
||||
[1]: https://www.nature.com/articles/s41467-022-29939-5 (NequIP)
|
||||
|
||||
[2]: https://arxiv.org/abs/2206.07697 (MACE)
|
||||
|
||||
[3]: https://pmc.ncbi.nlm.nih.gov/articles/PMC10199065/ (DeepH-E3)
|
||||
|
||||
[4]: https://proceedings.mlr.press/v202/yu23i (QHNet)
|
||||
|
||||
[5]: https://neurips.cc/virtual/2025/poster/117646 (Neural Hamiltonian Diffusion)
|
||||
|
||||
[6]: https://dx.doi.org/10.1016/j.cpc.2018.03.016 (DeePMD)
|
||||
|
||||
[7]: https://dx.doi.org/10.1103/PhysRevB.109.144426 (SpinGNN)
|
||||
|
||||
[8]: https://journals.aps.org/prb/abstract/10.1103/PhysRevB.99.014104 (ACE Framework)
|
||||
|
||||
[9]: https://arxiv.org/abs/2403.11287 (AI2DFT)
|
||||
Reference in New Issue
Block a user