10 KiB
3.2. Architecture-Embedded Physics: Hard Constraints and Geometric Deep Learning
While data- and loss-embedded physics (Section 3.1) offer a flexible means to encourage physical plausibility, they fundamentally rely on soft penalties. This reliance introduces severe optimization challenges, as the network must simultaneously balance data fitting with the minimization of PDE residuals. Architecture-embedded physics addresses these failure modes by transitioning from "soft" optimization penalties to "hard" structural constraints. Instead of relying on the loss landscape to steer the model toward physical reality, this paradigm directly bakes invariances, symmetries, and conservation laws into the network's internal topology [-].
3.2.1. Coordinate Bias and Failure Modes of Data Augmentation
In scientific domains, physical systems are defined by their geometric structure and inherent symmetries. For instance, the physical forces acting on a molecule must rotate exactly as the molecule rotates in 3D space. Standard Multi-Layer Perceptrons (MLPs) and Convolutional Neural Networks (CNNs) are fundamentally coordinate-dependent; they possess no structural awareness of Euclidean symmetries. Consequently, when mapping a spatial input to a physical property, standard architectures fail to commute with symmetry operators.
To demonstrate this formally, let \mathcal T_g represent a spatial
transformation operator corresponding to a continuous group element
g \in SE(3) (such as a 3D rotation or translation). For a standard black-box
neural network u_\theta, applying the physical transformation to the input
coordinates x prior to the forward pass does not yield the same result as
applying the transformation to the network's predicted output. Mathematically,
the operations do not commute:
u_\theta(\mathcal T_g x, t) \ne \mathcal T_g u_\theta(x, t)
Because this standard mapping is completely blind to the underlying symmetry group, the network is forced to learn the fundamental rules of geometric physics entirely from scratch. To compensate for this coordinate bias, classical Deep Learning relies heavily on data augmentation—training the model on thousands of artificially rotated or translated examples. However, this approach is computationally wasteful, limits data efficiency, and only approximates symmetry, leaving the model vulnerable to out-of-distribution geometric orientations.
Remedy: Equivariant Tensor Networks
Equivariant Tensor Networks resolve this representation failure by
architecturally restricting the neural mapping such that its internal feature
representations transform exactly according to the underlying symmetry group,
such as E(3) (Euclidean) or SO(3) (Rotation) [-].
By treating the general physical operator \mathcal O as a symmetry
operator \mathcal T_g, the network u_\theta is structurally constrained to
natively satisfy equivariance:
u_\theta(\mathcal T_g x, t) = \mathcal T_g u_\theta (x, t)
This inductive bias renders the model coordinate-blind, leading to exceptional data efficiency and robustness. State-of-the-art models in this category, such as DeepH-E3 and NequIP, leverage this exact equivariance to predict electronic Hamiltonians and interatomic potentials with sub-meV accuracy, outperforming traditional solvers while eliminating the need for geometric data augmentation [-].
3.2.2. Diagnosis: Energy Drift in Long-Horizon Rollouts
When standard autoregressive models (e.g., RNNs, standard Transformers) are used to simulate physical dynamics, they typically predict the vector field of the next state directly from the current state [-]. Because these black-box architectures possess no inherent concept of conservation laws (like energy, momentum, or mass), local approximation errors inevitably accumulate over sequential time steps.
Let a physical state space be defined by its coordinates and
momenta x = (q, p). A standard network attempts to learn the time derivative
directly:
u_\theta = f_\theta(x, t) = dx \over dt
When numerically integrated over N discrete time steps, the predicted
trajectory becomes:
x_N = x_0 + \sum_{k=0}^{N-1} \left[ u_\theta \delta t + \epsilon_k \right]
Because the learned mapping is unconstrained, its Jacobian is not guaranteed to
be symplectic. As a result the flow is not volume-preserving in phase
space (\nabla f \ne 0) and the local errors \epsilon_k compound. This energy
drift causes the simulated system to depart from the valid physical manifold,
often resulting in non-physical behavior or numerical explosions during
long-duration simulations [-].
Remedy: Hamiltonian Neural Networks (HNNs)
Hamiltonian Networks restructure the learning problem to strictly preserve
physical manifolds. Instead of predicting the state vector directly, the
architecture is designed to predict a scalar Hamiltonian (or total energy
potential) H(q, p) [-]. The actual physical state is then derived analytically
by taking the symplectic gradient of that predicted energy surface.
The model governs the system's dynamics through Hamilton's equations:
\frac{dq}{dt} = \frac{\partial H_\theta}{\partial p}, \quad \frac{dp}{dt} = -\frac{\partial H_\theta}{\partial q}
Because the model's final dynamical outputs are strictly derived from the orthogonal gradients of a single scalar field, the vector field is perfectly conservative by definition. This structural integration of symplectic mechanics guarantees energy conservation over indefinite rollout horizons, a feat that is nearly impossible for purely data-embedded PINN models [-].
3.2.3. Diagnosis: High Dimensionality of Multi-Body Problems
Complexity in atomic and multi-scale physics often arises from the interaction of multiple particles, which scales combinatorially. Standard fully connected networks struggle to capture these complex, higher-order interaction patterns from raw positional data without requiring exponentially large parameter counts [-].
Consider a macroscopic physical property U of a system containing N
particles at
coordinates {r_n}. A complete description requires many-body expansion:
$$U(r) = U_0 + \sum_i U_1(r_i) + \sum_{i < j} U_2(r_i, r_j) + \sum_{i < j < k} U_3(r_i, r_j, r_k) + \cdots
Where U_n represents the exact $n$-body interaction term. The number of
discrete combinations required to evaluate an expanded $n$-body interaction
scales combinatorially as \binom{N}{n} \sim O(N^n). For
a standard MLP that flattens the input into a single $3N$-dimensional vector,
implicitly learning these $n$-order spatial correlations requires dense weight
matrices whose parameter counts explode exponentially with the complexity of the
physical environment.
Forcing a neural network to learn these complex interactions completely from scratch typically results in overparameterization, poor generalization, and a complete lack of physical interpretability [-].
Remedy: Basis-Expansion Networks
Rather than relying on generic weight matrices to learn multi-body physics,
Basis-Expansion Networks limit the network’s representation space to a strict
basis of physically proven templates \phi_i. By projecting the problem onto
a mathematically complete basis set (such as the Atomic Cluster Expansion [-]),
the neural network only needs to learn the coefficients for these basis
functions.
Treating \mathcal O as a Projection Operator, the network u_\theta acts as a
weighted sum of physical basis functions:
u_\theta = f_\theta(\mathcal O(x, t)) = \sum_i w_{i \theta} \phi_i (x, t)
where f_\theta is the learnable neural mapping and \phi_i are the analytical
basis [-].
3.2.4. Implications and Relaxed Constraints
A deep insight emerging from recent ENN literature is the realization that strict, hard-coded equivariance might actually be too restrictive for certain physical systems, particularly those exhibiting "broken symmetry" [-]. This diagnostic realization has led to the development of relaxed-symmetry models [-]. These architectures allow for small, learnable deviations from perfect mathematical equivariance, providing the structural flexibility required to model materials under extreme stress or in non-equilibrium states without completely abandoning the physical prior. Furthermore, the move toward unsupervised learning in differentiable solvers like AI2DFT suggests that variational principles of physics can ultimately serve as both the loss function and the architectural constraint, potentially bypassing the need for labeled numerical data entirely [-].
| Type | Method | Venue / Year | Keyword-style contribution |
|---|---|---|---|
| Equivariant Networks | NequIP | ||
| MACE | |||
| DeepH-E3 | |||
| QHNet | |||
| Hamiltonian Networks | Neural Hamiltonian Diffusion | ||
| DeePMD | |||
| HEGNN | |||
| SEGNN | |||
| Basis Expansion Networks | ACE Framework (2024) | ||
| AI2DFT (2024) |