204 lines
10 KiB
Markdown
204 lines
10 KiB
Markdown
## 3.2. Architecture-Embedded Physics: Hard Constraints and Geometric Deep Learning
|
||
|
||
While data- and loss-embedded physics (Section 3.1) offer a flexible means to
|
||
encourage physical plausibility, they fundamentally rely on soft penalties. This
|
||
reliance introduces severe optimization challenges, as the network must
|
||
simultaneously balance data fitting with the minimization of PDE residuals.
|
||
Architecture-embedded physics addresses these failure modes by transitioning
|
||
from "soft" optimization penalties to "hard" structural constraints. Instead of
|
||
relying on the loss landscape to steer the model toward physical reality, this
|
||
paradigm directly bakes invariances, symmetries, and conservation laws into the
|
||
network's internal topology [-].
|
||
|
||
### 3.2.1. Diagnosis: Coordinate Bias and the Failure of Data Augmentation
|
||
|
||
In scientific domains, physical systems are defined by their geometric structure
|
||
and inherent symmetries. For instance, the physical forces acting on a molecule
|
||
must rotate exactly as the molecule rotates in 3D space. Standard Multi-Layer
|
||
Perceptrons (MLPs) and Convolutional Neural Networks (CNNs) are fundamentally
|
||
coordinate-dependent; they possess no structural awareness of Euclidean
|
||
symmetries. Consequently, when mapping a spatial input to a physical property,
|
||
standard architectures fail to commute with symmetry operators.
|
||
|
||
To demonstrate this formally, let $\mathcal T_g$ represent a spatial
|
||
transformation operator corresponding to a continuous group element
|
||
$g \in SE(3)$ (such as a 3D rotation or translation). For a standard black-box
|
||
neural network $u_\theta$, applying the physical transformation to the input
|
||
coordinates $x$ prior to the forward pass does not yield the same result as
|
||
applying the transformation to the network's predicted output. Mathematically,
|
||
the operations do not commute:
|
||
|
||
$$u_\theta(\mathcal T_g x, t) \ne \mathcal T_g u_\theta(x, t)$$
|
||
|
||
Because this standard mapping is completely blind to the underlying symmetry
|
||
group, the network is forced to learn the fundamental rules of geometric physics
|
||
entirely from scratch. To compensate for this coordinate bias, classical Deep
|
||
Learning relies heavily on data augmentation—training the model on thousands of
|
||
artificially rotated or translated examples. However, this approach is
|
||
computationally wasteful, limits data efficiency, and only approximates
|
||
symmetry, leaving the model vulnerable to out-of-distribution geometric
|
||
orientations.
|
||
|
||
#### Remedy: Equivariant Tensor Networks
|
||
|
||
Equivariant Tensor Networks resolve this representation failure by
|
||
architecturally restricting the neural mapping such that its internal feature
|
||
representations transform exactly according to the underlying symmetry group,
|
||
such as $E(3)$ (Euclidean) or $SO(3)$ (Rotation) [-].
|
||
|
||
By treating the general physical operator $\mathcal O$ as a symmetry
|
||
operator $\mathcal T_g$, the network $u_\theta$ is structurally constrained to
|
||
natively satisfy equivariance:
|
||
|
||
$$u_\theta(\mathcal T_g x, t) = \mathcal T_g u_\theta (x, t)$$
|
||
|
||
This inductive bias renders the model coordinate-blind, leading to exceptional
|
||
data efficiency and robustness. State-of-the-art models in this category, such
|
||
as DeepH-E3 and NequIP, leverage this exact equivariance to predict electronic
|
||
Hamiltonians and interatomic potentials with sub-meV accuracy, outperforming
|
||
traditional solvers while eliminating the need for geometric data
|
||
augmentation [-].
|
||
|
||
### 3.2.2. Diagnosis: Energy Drift in Long-Horizon Rollouts
|
||
|
||
When standard autoregressive models (e.g., RNNs, standard Transformers) are used
|
||
to simulate physical dynamics, they typically predict the vector field of the
|
||
next state directly from the current state [-]. Because these
|
||
black-box architectures possess no inherent concept of conservation laws (like
|
||
energy, momentum, or mass), local approximation errors inevitably accumulate
|
||
over sequential time steps.
|
||
|
||
Let a physical state space be defined by its coordinates and
|
||
momenta $x = (q, p)$. A standard network attempts to learn the time derivative
|
||
directly:
|
||
|
||
$$u_\theta = f_\theta(x, t) = dx \over dt$$
|
||
|
||
When numerically integrated over $N$ discrete time steps, the predicted
|
||
trajectory becomes:
|
||
|
||
$$
|
||
x_N = x_0 + \sum_{k=0}^{N-1} \left[ u_\theta \delta t + \epsilon_k \right]
|
||
$$
|
||
|
||
Because the learned mapping is unconstrained, its Jacobian is not guaranteed to
|
||
be symplectic. As a result the flow is not volume-preserving in phase
|
||
space ($\nabla f \ne 0$) and the local errors $\epsilon_k$ compound. This energy
|
||
drift causes the simulated system to depart from the valid physical manifold,
|
||
often resulting in non-physical behavior or numerical explosions during
|
||
long-duration simulations [-].
|
||
|
||
#### Remedy: Hamiltonian Neural Networks (HNNs)
|
||
|
||
Hamiltonian Networks restructure the learning problem to strictly preserve
|
||
physical manifolds. Instead of predicting the state vector directly, the
|
||
architecture is designed to predict a scalar Hamiltonian (or total energy
|
||
potential) $H(q, p)$ [-]. The actual physical state is then derived analytically
|
||
by taking the symplectic gradient of that predicted energy surface.
|
||
|
||
The model governs the system's dynamics through Hamilton's equations:
|
||
|
||
$$\frac{dq}{dt} = \frac{\partial H_\theta}{\partial p}, \quad \frac{dp}{dt} = -\frac{\partial H_\theta}{\partial q}$$
|
||
|
||
Because the model's final dynamical outputs are strictly derived from the
|
||
orthogonal gradients of a single scalar field, the vector field is perfectly
|
||
conservative by definition. This structural integration of symplectic mechanics
|
||
guarantees energy conservation over indefinite rollout horizons, a feat that is
|
||
nearly impossible for purely data-embedded PINN models [-].
|
||
|
||
### 3.2.3. Diagnosis: High Dimensionality of Multi-Body Problems
|
||
|
||
Complexity in atomic and multi-scale physics often arises from the interaction
|
||
of multiple particles, which scales combinatorially. Standard fully connected
|
||
networks struggle to capture these complex, higher-order interaction patterns
|
||
from raw positional data without requiring exponentially large parameter
|
||
counts [-].
|
||
|
||
Consider a macroscopic physical property $U$ of a system containing $N$
|
||
particles at
|
||
coordinates ${r_n}$. A complete description requires many-body expansion:
|
||
|
||
$$U(r) = U_0 + \sum_i U_1(r_i) + \sum_{i < j} U_2(r_i, r_j) + \sum_{i < j < k}
|
||
U_3(r_i, r_j, r_k) + \cdots
|
||
|
||
Where $U_n$ represents the exact $n$-body interaction term. The number of
|
||
discrete combinations required to evaluate an expanded $n$-body interaction
|
||
scales combinatorially as $\binom{N}{n} \sim O(N^n)$. For
|
||
a standard MLP that flattens the input into a single $3N$-dimensional vector,
|
||
implicitly learning these $n$-order spatial correlations requires dense weight
|
||
matrices whose parameter counts explode exponentially with the complexity of the
|
||
physical environment.
|
||
|
||
Forcing a neural network to learn these complex interactions completely from
|
||
scratch typically results in overparameterization, poor generalization, and a
|
||
complete lack of physical interpretability [-].
|
||
|
||
#### Remedy: Basis-Expansion Networks
|
||
|
||
Rather than relying on generic weight matrices to learn multi-body physics,
|
||
Basis-Expansion Networks limit the network’s representation space to a strict
|
||
basis of physically proven templates $\phi_i$. By projecting the problem onto
|
||
a mathematically complete basis set (such as the Atomic Cluster Expansion [-]),
|
||
the neural network only needs to learn the coefficients for these basis
|
||
functions.
|
||
|
||
Treating $\mathcal O$ as a Projection Operator, the network $u_\theta$ acts as a
|
||
weighted sum of physical basis functions:
|
||
|
||
$$u_\theta = f_\theta(\mathcal O(x, t)) = \sum_i w_{i \theta} \phi_i (x, t)$$
|
||
|
||
where $f_\theta$ is the learnable neural mapping and $\phi_i$ are the analytical
|
||
basis [-].
|
||
|
||
### 3.2.4. Implications and Relaxed Constraints
|
||
|
||
A deep insight emerging from recent ENN literature is the realization that
|
||
strict, hard-coded equivariance might actually be *too* restrictive for certain
|
||
physical systems, particularly those exhibiting "broken symmetry" [-]. This
|
||
diagnostic realization has led to the development of relaxed-symmetry
|
||
models [-]. These architectures allow for small, learnable deviations from
|
||
perfect mathematical equivariance, providing the structural flexibility required
|
||
to model materials under extreme stress or in non-equilibrium states without
|
||
completely abandoning the physical prior. Furthermore, the move toward
|
||
unsupervised learning in differentiable solvers like AI2DFT suggests that
|
||
variational principles of physics can ultimately serve as both the loss function
|
||
and the architectural constraint, potentially bypassing the need for labeled
|
||
numerical data entirely [-].
|
||
|
||
| Type | Method | Venue / Year | Keyword-style contribution |
|
||
|:-------------------------|:----------------------------------|--------------|:---------------------------|
|
||
| Equivariant Networks | [NequIP][1] | | |
|
||
| | [MACE][2] | | |
|
||
| | [DeepH-E3][3] | | |
|
||
| | [QHNet][4] | | |
|
||
| Hamiltonian Networks | [Neural Hamiltonian Diffusion][5] | | |
|
||
| | [DeePMD][6] | | |
|
||
| | [HEGNN][7] | | |
|
||
| | [SEGNN][7] | | |
|
||
| Basis Expansion Networks | [ACE Framework (2024)][8] | | |
|
||
| | [AI2DFT (2024)][9] | | |
|
||
|
||
[0a]: https://pmc.ncbi.nlm.nih.gov/articles/PMC12541325/ (review of equivariant networks)
|
||
|
||
[0b]: https://arxiv.org/abs/2601.04104v1 (review of equivariant networks)
|
||
|
||
[0c]: https://www.oaepublish.com/articles/jmi.2025.17 (review of hamiltonian networks)
|
||
|
||
[1]: https://www.nature.com/articles/s41467-022-29939-5 (NequIP)
|
||
|
||
[2]: https://arxiv.org/abs/2206.07697 (MACE)
|
||
|
||
[3]: https://pmc.ncbi.nlm.nih.gov/articles/PMC10199065/ (DeepH-E3)
|
||
|
||
[4]: https://proceedings.mlr.press/v202/yu23i (QHNet)
|
||
|
||
[5]: https://neurips.cc/virtual/2025/poster/117646 (Neural Hamiltonian Diffusion)
|
||
|
||
[6]: https://dx.doi.org/10.1016/j.cpc.2018.03.016 (DeePMD)
|
||
|
||
[7]: https://dx.doi.org/10.1103/PhysRevB.109.144426 (SpinGNN)
|
||
|
||
[8]: https://journals.aps.org/prb/abstract/10.1103/PhysRevB.99.014104 (ACE Framework)
|
||
|
||
[9]: https://arxiv.org/abs/2403.11287 (AI2DFT)
|