Files
COMP-790-175/collated.tex
David Allemang 93bfee7eef Spring 2026
2026-05-25 11:34:56 -04:00

74 lines
8.9 KiB
TeX

\subsection{Architecture-Embedded Physics: Hard Constraints and Geometric Deep Learning}
While data- and loss-embedded physics (Section 3.1) offer a flexible means to encourage physical plausibility, they fundamentally rely on soft penalties. This reliance introduces severe optimization challenges, as the network must simultaneously balance data fitting with the minimization of PDE residuals. Architecture-embedded physics addresses these failure modes by transitioning from ``soft'' optimization penalties to ``hard'' structural constraints. Instead of relying on the loss landscape to steer the model toward physical reality, this paradigm directly bakes invariances, symmetries, and conservation laws into the network's internal topology {[}-{]}.
\subsubsection{Diagnosis: Coordinate Bias and the Failure of Data Augmentation}
In scientific domains, physical systems are defined by their geometric structure and inherent symmetries. For instance, the physical forces acting on a molecule must rotate exactly as the molecule rotates in 3D space. Standard Multi-Layer Perceptrons (MLPs) and Convolutional Neural Networks (CNNs) are fundamentally coordinate-dependent; they possess no structural awareness of Euclidean symmetries. Consequently, when mapping a spatial input to a physical property, standard architectures fail to commute with symmetry operators.
To demonstrate this formally, let \(\mathcal T_g\) represent a spatial transformation operator corresponding to a continuous group element \(g \in SE(3)\) (such as a 3D rotation or translation). For a standard black-box neural network \(u_\theta\), applying the physical transformation to the input coordinates \(x\) prior to the forward pass does not yield the same result as applying the transformation to the network's predicted output. Mathematically, the operations do not commute:
\[u_\theta(\mathcal T_g x, t) \ne \mathcal T_g u_\theta(x, t)\]
Because this standard mapping is completely blind to the underlying symmetry group, the network is forced to learn the fundamental rules of geometric physics entirely from scratch. To compensate for this coordinate bias, classical Deep Learning relies heavily on data augmentation---training the model on thousands of artificially rotated or translated examples. However, this approach is computationally wasteful, limits data efficiency, and only approximates symmetry, leaving the model vulnerable to out-of-distribution geometric orientations.
\paragraph{Remedy: Equivariant Tensor Networks}\label{remedy-equivariant-tensor-networks}
Equivariant Tensor Networks resolve this representation failure by architecturally restricting the neural mapping such that its internal feature representations transform exactly according to the underlying symmetry group, such as \(E(3)\) (Euclidean) or \(SO(3)\) (Rotation) {[}-{]}.
By treating the general physical operator \(\mathcal O\) as a symmetry operator \(\mathcal T_g\), the network \(u_\theta\) is structurally constrained to natively satisfy equivariance:
\[u_\theta(\mathcal T_g x, t) = \mathcal T_g u_\theta (x, t)\]
This inductive bias renders the model coordinate-blind, leading to exceptional data efficiency and robustness. State-of-the-art models in this category, such as DeepH-E3 and NequIP, leverage this exact equivariance to predict electronic Hamiltonians and interatomic potentials with sub-meV accuracy, outperforming traditional solvers while eliminating the need for geometric data augmentation {[}-{]}.
\subsubsection{3.2.2. Diagnosis: Energy Drift in Long-Horizon Rollouts}\label{diagnosis-energy-drift-in-long-horizon-rollouts}
When standard autoregressive models (e.g., RNNs, standard Transformers) are used to simulate physical dynamics, they typically predict the vector field of the next state directly from the current state {[}-{]}. Because these black-box architectures possess no inherent concept of conservation laws (like energy, momentum, or mass), local approximation errors inevitably accumulate over sequential time steps.
Let a physical state space be defined by its coordinates and momenta \(x = (q, p)\). A standard network attempts to learn the time derivative directly:
\[u_\theta = f_\theta(x, t) = dx \over dt\]
When numerically integrated over \(N\) discrete time steps, the predicted trajectory becomes:
\[ x_N = x_0 + \sum_{k=0}^{N-1} \left[ u_\theta \delta t + \epsilon_k \right] \]
Because the learned mapping is unconstrained, its Jacobian is not guaranteed to be symplectic. As a result the flow is not volume-preserving in phase space (\(\nabla f \ne 0\)) and the local errors \(\epsilon_k\) compound. This energy drift causes the simulated system to depart from the valid physical manifold, often resulting in non-physical behavior or numerical explosions during long-duration simulations {[}-{]}.
\paragraph{Remedy: Hamiltonian Neural Networks (HNNs)}\label{remedy-hamiltonian-neural-networks-hnns}
Hamiltonian Networks restructure the learning problem to strictly preserve physical manifolds. Instead of predicting the state vector directly, the architecture is designed to predict a scalar Hamiltonian (or total energy potential) \(H(q, p)\) {[}-{]}. The actual physical state is then derived analytically by taking the symplectic gradient of that predicted energy surface.
The model governs the system's dynamics through Hamilton's equations:
\[\frac{dq}{dt} = \frac{\partial H_\theta}{\partial p}, \quad \frac{dp}{dt} = -\frac{\partial H_\theta}{\partial q}\]
Because the model's final dynamical outputs are strictly derived from the orthogonal gradients of a single scalar field, the vector field is perfectly conservative by definition. This structural integration of symplectic mechanics guarantees energy conservation over indefinite rollout horizons, a feat that is nearly impossible for purely data-embedded PINN models {[}-{]}.
\subsubsection{3.2.3. Diagnosis: High Dimensionality of Multi-Body Problems}\label{diagnosis-high-dimensionality-of-multi-body-problems}
Complexity in atomic and multi-scale physics often arises from the interaction of multiple particles, which scales combinatorially. Standard fully connected networks struggle to capture these complex, higher-order interaction patterns from raw positional data without requiring exponentially large parameter counts {[}-{]}.
Consider a macroscopic physical property \(U\) of a system containing \(N\) particles at coordinates \({r_n}\). A complete description requires many-body expansion:
\$\$U(r) = U\_0 + \sum\emph{i U\_1(r\_i) + \sum}\{i \textless{} j\} U\_2(r\_i, r\_j) + \sum\_\{i \textless{} j \textless{} k\} U\_3(r\_i, r\_j, r\_k) + \cdots
Where \(U_n\) represents the exact \(n\)-body interaction term. The number of discrete combinations required to evaluate an expanded \(n\)-body interaction scales combinatorially as \(\binom{N}{n} \sim O(N^n)\). For a standard MLP that flattens the input into a single \(3N\)-dimensional vector, implicitly learning these \(n\)-order spatial correlations requires dense weight matrices whose parameter counts explode exponentially with the complexity of the physical environment.
Forcing a neural network to learn these complex interactions completely from scratch typically results in overparameterization, poor generalization, and a complete lack of physical interpretability {[}-{]}.
\paragraph{Remedy: Basis-Expansion Networks}\label{remedy-basis-expansion-networks}
Rather than relying on generic weight matrices to learn multi-body physics, Basis-Expansion Networks limit the network's representation space to a strict basis of physically proven templates \(\phi_i\). By projecting the problem onto a mathematically complete basis set (such as the Atomic Cluster Expansion {[}-{]}), the neural network only needs to learn the coefficients for these basis functions.
Treating \(\mathcal O\) as a Projection Operator, the network \(u_\theta\) acts as a weighted sum of physical basis functions:
\[u_\theta = f_\theta(\mathcal O(x, t)) = \sum_i w_{i \theta} \phi_i (x, t)\]
where \(f_\theta\) is the learnable neural mapping and \(\phi_i\) are the analytical basis {[}-{]}.
\subsubsection{3.2.4. Implications and Relaxed Constraints}\label{implications-and-relaxed-constraints}
A deep insight emerging from recent ENN literature is the realization that strict, hard-coded equivariance might actually be \emph{too} restrictive for certain physical systems, particularly those exhibiting ``broken symmetry'' {[}-{]}. This diagnostic realization has led to the development of relaxed-symmetry models {[}-{]}. These architectures allow for small, learnable deviations from perfect mathematical equivariance, providing the structural flexibility required to model materials under extreme stress or in non-equilibrium states without completely abandoning the physical prior. Furthermore, the move toward unsupervised learning in differentiable solvers like AI2DFT suggests that variational principles of physics can ultimately serve as both the loss function and the architectural constraint, potentially bypassing the need for labeled numerical data entirely {[}-{]}.