commit 93bfee7eef3a712f2c2b57265948bca7c4449a6b
Author: David Allemang <david.allemang@kitware.com>
Date:   Mon May 25 11:31:33 2026 -0400

    Spring 2026

diff --git a/00-collated-results.md b/00-collated-results.md
new file mode 100644
index 0000000..68edf7b
--- /dev/null
+++ b/00-collated-results.md
@@ -0,0 +1,206 @@
+# The Structural Integration of Physical Laws in Neural Architectures: A Multi-Paradigm Survey of Physics-Informed Machine Learning
+
+The intersection of classical physics and contemporary artificial intelligence
+has given rise to a transformative field known as Physics-Informed Machine
+Learning (PIML). For decades, scientific discovery relied on the dichotomy of
+"first-principles" mathematical modeling and purely empirical observation.
+However, the modern data landscape, characterized by high-dimensional
+observations from sensors and simulations, has outpaced the capabilities of
+traditional numerical solvers while simultaneously highlighting the fragility of
+standard "black-box" neural networks.[1][1] PIML seeks to synthesize these
+approaches by treating physical laws not merely as external benchmarks, but as
+foundational constraints within the learning pipeline. This synthesis addresses
+the chronic data scarcity in scientific domains, enhances the generalizability
+of models across unseen regimes, and ensures that the outputs of deep learning
+models remain physically plausible.[3][3] This report provides an exhaustive
+analysis of the four primary paradigms of physical integration: data- and
+loss-embedded physics, architecture-embedded physics, operator-embedded physics,
+and system-embedded physics.
+
+## II. Architecture-Embedded Physics: Hard Constraints and Geometric Deep Learning
+
+Architecture-embedded physics represents a paradigm shift from "soft" to "hard"
+physical constraints. Instead of hoping the loss function steers the model
+toward physical reality, architecture-embedded physics bakes physical laws
+directly into the network's topology.[2][2] This approach ensures that the model
+natively respects symmetries (like rotation or translation invariance),
+conservation laws (like mass or energy), and structural interaction patterns
+(like $N$-body dynamics).[15][15]
+
+### **Representative State-of-the-Art in Architecture-Embedded Physics**
+
+State-of-the-art developments in this category focus on atomic and electronic
+structure modeling, where the interaction between particles is modeled as a
+graph where nodes are atoms and edges are bonds.[15] Models like DeepH-E3 and
+NequIP have demonstrated the ability to predict electronic Hamiltonians and
+interatomic potentials with sub-meV accuracy, outperforming traditional solvers
+while being orders of magnitude faster.[15]
+
+#### Equivariant Tensor Networks
+
+In scientific domains, physical systems are defined by their geometric
+structure. For instance, the forces acting on a molecule must rotate exactly as
+the molecule rotates. Standard MLPs must learn this through data
+augmentation (training on thousands of rotated examples), which is
+computationally wasteful and prone to error. In contrast, Equivariant Tensor
+Networks are architecturally designed so that their internal feature
+representations transform according to the underlying symmetry group, such as 
+E(3) (Euclidean) or SO(3) (Rotation). This ensures the neural mapping fθ 
+satisfies fθ(g⋅x)=g⋅fθ(x) for any group action g. This "inductive bias" makes
+the model coordinate-blind, leading to exceptional data efficiency and
+robustness.
+
+Unified Formulation: ∂ is a Symmetry Operator. The network uθ is restricted such
+that uθ(partial(x))=partial(uθ(x)).
+
+| Paper Reference                 | Core Innovation              | Key Strengths                                                                   | Identified Weaknesses                                                         |
+|:--------------------------------|:-----------------------------|:--------------------------------------------------------------------------------|:------------------------------------------------------------------------------|
+| [Group-Equivariant Survey][18]  | Group Representation Theory  | Establishes the mathematical standard for SO(3) and Lorentz group networks.     | Abstract theoretical nature; few practical code implementations provided.     |
+| [NequIP / MACE (2021/2024)][15] | E(3)-Equivariant Potentials  | Unprecedented data efficiency; captures higher-order multi-body interactions.   | Complex implementation of tensor products can lead to high inference latency. |
+| [DeepH-E3 (2023)][16]           | Equivariant DFT Hamiltonian  | Preserves Euclidean symmetry for supercells $\>10^4$ atoms; ab initio accuracy. | Significant memory consumption for high-rank tensor representations.          |
+| [Atomic-site ENN (2024)][15]    | Lattice Symmetry-Aware       | Bridges microscopic electronic processes to mesoscale behavior in solids.       | Computationally intensive for very large supercells.                          |
+| [QHNet (2023)][19]              | Efficient SE(3)-Equivariance | Reduces the number of tensor products by 92% compared to previous SOTA.         | Trade-off between structural simplicity and expressive capacity.              |
+
+#### Hamiltonian Networks
+
+Physical systems are governed by fundamental conservation laws (energy,
+momentum, mass). In this subcategory, the architecture moves from predicting a
+field u to predicting a scalar Hamiltonian or energy potential H. By embedding
+the symplectic structure of physics into the layers, the model cannot produce a
+result which violates the conserved property because the final output is derived
+by the predicted energy surface. This ensures that the system stays on a valid
+physical manifold during long-term time-series simulations, avoiding the
+"explosion" of values common in black-box models.
+
+Unified Formulation: ∂ is a Conservation Operator (like the Gradient or Curl).
+The model is forced to output a scalar "Energy" first, and the actual physical
+state is derived by taking the derivative of that energy.
+
+| Paper Reference                           | Core Innovation               | Key Strengths                                                            | Identified Weaknesses                                                        |
+|:------------------------------------------|:------------------------------|:-------------------------------------------------------------------------|:-----------------------------------------------------------------------------|
+| [Neural Hamiltonian Diffusion (2025)][20] | Manifold Hamiltonian Learning | Unifies stochastic diffusion and Hamiltonian mechanics on curved spaces. | Requires a-priori knowledge of the Riemannian manifold's metric.             |
+| [Deep Potentials (2021)][17]              | Density-based Descriptors     | High-speed molecular dynamics with quantum mechanical fidelity.          | Struggles with systems undergoing chemical reactions (bond breaking).        |
+| [SpinGNN (2025)][17]                      | Heisenberg/Spin-Lattice GNN   | Preserves symmetries of exchange and spin-lattice couplings for magnets. | Specialized architecture that lacks general-purpose utility for soft matter. |
+| [Heisenberg Edge GNN][17]                 | Equivariant Message Passing   | Specifically captures tensorial quantities like spin Hall conductivity.  | Performance is sensitive to the cutoff radius for atomic interactions.       |
+
+#### Basis-Expansion Networks
+
+Complexity in physics often arises from the interaction of multiple particles
+becoming exponentially difficult to calculate. Rather than forcing a neural
+network to learn these complex interactions from raw data, Basis-Expansion
+Networks limit the network’s "vocabulary" to a set of physically proven
+templates. By projecting the problem onto a mathematically complete basis set (
+like Atomic Cluster Expansion), the network only needs to learn the weights of
+these basis functions. This turns the neural network into a "Neural Code" or a
+differentiable version of a classical physics solver, combining the flexibility
+of AI with the rigor of analytical physics.
+
+Unified Formulation: ∂ is a Projection Operator. The network uθ is a weighted
+sum of physical templates: uθ=∑wiϕi, where ϕi are fixed, physically valid
+functions.
+
+| Paper Reference            | Core Innovation              | Key Strengths                                                                   | Identified Weaknesses                                                            |
+|:---------------------------|:-----------------------------|:--------------------------------------------------------------------------------|:---------------------------------------------------------------------------------|
+| [ACE Framework (2024)][15] | Atomic Cluster Expansion     | Hierarchical basis for symmetry-adapted invariants; mathematically complete.    | Steep learning curve for researchers not versed in group representation theory.  |
+| [AI2DFT (2024)][22]        | Differential DFT Neural Code | First unsupervised physics-informed learning framework for DFT quantities.      | Stability depends on the quality of the variational energy functional.           |
+| [Timrov et al. (2025)][21] | Hubbard Parameter ENN        | Speeds up Hubbard $U$ and $V$ calculations via equivariant occupation matrices. | Transferability is high but confined to the specific lattice structures trained. |
+
+### **Implications of Hard Constraints on Emergent Behavior**
+
+The integration of Hamiltonian mechanics into neural architectures (Hamiltonian
+Neural Networks or HNNs) structurally guarantees energy conservation, a feat
+that is nearly impossible for data-embedded PINN models over long simulation
+times.[20][20] By deriving the dynamics from a learned scalar Hamiltonian
+function $H\_\\theta$, the model respects the symplectic structure of phase
+space, preventing the "energy drift" commonly seen in standard RNNs or
+Transformers used for physics simulation.[20][20]
+
+A deep insight from recent ENN literature is the realization that strict
+equivariance might be too restrictive for certain "broken symmetry" systems.
+This has led to the development of relaxed-symmetry models that allow for small,
+learnable deviations from perfect equivariance, which is critical for modeling
+materials under stress or in non-equilibrium states.[17] Furthermore, the move
+toward "unsupervised" learning in models like AI2DFT suggests that the
+variational principles of physics (like minimizing total energy) can serve as
+the ultimate loss function, potentially bypassing the need for labeled DFT data
+entirely.[22]
+
+#### **Works cited**
+
+[0]: https://www.researchgate.net/publication/391540378_When_physics_meets_machine_learning_a_survey_of_physics-informed_machine_learning
+
+[1]: https://arxiv.org/html/2408.09840v2
+
+[2]: https://arxiv.org/html/2506.13777v1
+
+[3]: https://arxiv.org/html/2501.06572v1
+
+[4]: https://www.ejpam.com/ejpam/article/view/6334/2750
+
+[5]: https://www.mdpi.com/2227-7390/13/20/3289
+
+[6]: https://www.articsledge.com/post/physics-informed-neural-networks-pinns
+
+[7]: https://www.researchgate.net/publication/388357372_A_Review_of_Physics-Informed_Neural_Networks
+
+[8]: https://towardsdatascience.com/essential-review-papers-on-physics-informed-neural-networks-a-curated-guide-for-practitioners/
+
+[9]: https://ieeexplore.ieee.org/iel8/8784343/10845831/10843279.pdf
+
+[10]: https://arxiv.org/html/2408.06650v1
+
+[11]: https://github.com/Event-AHU/PINN_Paper_List
+
+[12]: https://www.frontiersin.org/journals/artificial-intelligence/articles/10.3389/frai.2026.1717117/full
+
+[13]: https://www.emergentmind.com/topics/physics-inspired-kolmogorov-arnold-network-pikan
+
+[14]: https://arxiv.org/html/2601.04104v1
+
+[15]: https://pmc.ncbi.nlm.nih.gov/articles/PMC10199065/
+
+[16]: https://www.oaepublish.com/articles/jmi.2025.17
+
+[17]: https://pmc.ncbi.nlm.nih.gov/articles/PMC12541325/
+
+[18]: https://proceedings.mlr.press/v202/yu23i/yu23i.pdf
+
+[19]: https://neurips.cc/virtual/2025/poster/117646
+
+[20]: https://communities.springernature.com/posts/machine-learning-hubbard-parameters-with-equivariant-neural-networks
+
+[21]: https://arxiv.org/pdf/2403.11287
+
+[22]: https://www.researchgate.net/publication/394511831_Gaussian_Splashing_Unified_Particles_for_Versatile_Motion_Synthesis_and_Rendering
+
+[23]: https://arxiv.org/html/2512.24986v1
+
+[24]: https://www.researchgate.net/publication/382119336_A_Review_of_Differentiable_Simulators
+
+[25]: https://www.emergentmind.com/topics/differentiable-simulation-engines
+
+[26]: https://arxiv.org/html/2203.00806v5
+
+[27]: https://www.semanticscholar.org/paper/A-Review-of-Differentiable-Simulators-Newbury-Collins/b3a10024b9ad159a6dc68d3acce36dffc464dd67
+
+[28]: https://www.azooptics.com/News.aspx?newsID=30558
+
+[29]: https://www.researchgate.net/publication/372961478_Spatially_Varying_Nanophotonic_Neural_Networks
+
+[30]: https://pubs.acs.org/doi/10.1021/acsphotonics.4c01874
+
+[31]: https://www.mdpi.com/2304-6732/12/12/1187
+
+[32]: https://pmc.ncbi.nlm.nih.gov/articles/PMC12758549/
+
+[33]: https://opg.optica.org/oe/abstract.cfm?uri=oe-34-2-2197
+
+[34]: https://www.spiedigitallibrary.org/conference-proceedings-of-spie/PC13585.toc
+
+[35]: https://www.spiedigitallibrary.org/conference-proceedings-of-spie/0/PC139061/Diffractive-deep-neural-networks-with-multimode-fibers/10.1117/12.3079670.full
+
+[36]: https://opg.optica.org/optcon/fulltext.cfm?uri=optcon-4-8-1810
+
+[37]: https://www.spiedigitallibrary.org/journals/advanced-photonics-nexus/volume-4/issue-2/026009/Compressed-meta-optical-encoder-for-image-classification/10.1117/1.APN.4.2.026009.pdf
+
+[38]: https://www.researchgate.net/publication/339555300_Deep_Tensor_ADMM-Net_for_Snapshot_Compressive_Imaging
diff --git a/01-collated-results-refined.md b/01-collated-results-refined.md
new file mode 100644
index 0000000..78e351b
--- /dev/null
+++ b/01-collated-results-refined.md
@@ -0,0 +1,208 @@
+# The Structural Integration of Physical Laws in Neural Architectures: A Multi-Paradigm Survey of Physics-Informed Machine Learning
+
+The intersection of classical physics and contemporary artificial intelligence
+has given rise to a transformative field known as Physics-Informed Machine
+Learning (PIML). For decades, scientific discovery relied on the dichotomy of "
+first-principles" mathematical modeling and purely empirical observation.
+However, the modern data landscape, characterized by high-dimensional
+observations from sensors and simulations, has outpaced the capabilities of
+traditional numerical solvers while simultaneously highlighting the fragility of
+standard "black-box" neural networks.[1][1] PIML seeks to synthesize these
+approaches by treating physical laws not merely as external benchmarks, but as
+foundational constraints within the learning pipeline. This synthesis addresses
+the chronic data scarcity in scientific domains, enhances the generalizability
+of models across unseen regimes, and ensures that the outputs of deep learning
+models remain physically plausible.[3][3] This report provides an exhaustive
+analysis of the four primary paradigms of physical integration: data- and
+loss-embedded physics, architecture-embedded physics, operator-embedded physics,
+and system-embedded physics.
+
+## II. Architecture-Embedded Physics: Hard Constraints and Geometric Deep Learning
+
+Architecture-embedded physics represents a paradigm shift from "soft" to "hard"
+physical constraints. Instead of hoping the loss function steers the model
+toward physical reality, architecture-embedded physics bakes physical laws
+directly into the network's topology.[2][2] This approach ensures that the model
+natively respects symmetries (like rotation or translation invariance),
+conservation laws (like mass or energy), and structural interaction patterns (
+like $N$-body dynamics).[15][15]
+
+### **Representative State-of-the-Art in Architecture-Embedded Physics**
+
+State-of-the-art developments in this category focus on atomic and electronic
+structure modeling, where the interaction between particles is modeled as a
+graph where nodes are atoms and edges are bonds.[15] Models like DeepH-E3 and
+NequIP have demonstrated the ability to predict electronic Hamiltonians and
+interatomic potentials with sub-meV accuracy, outperforming traditional solvers
+while being orders of magnitude faster.[15]
+
+#### Equivariant Tensor Networks
+
+In scientific domains, physical systems are defined by their geometric
+structure. For instance, the forces acting on a molecule must rotate exactly as
+the molecule rotates. Standard MLPs must learn this through data augmentation (
+training on thousands of rotated examples), which is computationally wasteful
+and prone to error. In contrast, Equivariant Tensor Networks are architecturally
+designed so that their internal feature representations transform according to
+the underlying symmetry group, such as $E(3)$ (Euclidean) or $SO(3)$ (Rotation).
+This ensures the neural mapping $f_\theta$
+satisfies $f_\theta(g \cdot x)=g \cdot f_\theta(x)$ for any group action $g$.
+This "inductive bias" makes the model coordinate-blind, leading to exceptional
+data efficiency and robustness.
+
+Unified Formulation: $\partial$ is a Symmetry Operator. The network $u_\theta$
+is restricted such that:
+$$u_\theta(\partial(x))=\partial(u_\theta(x))$$
+
+| Paper Reference                 | Core Innovation                | Key Strengths                                                                  | Identified Weaknesses                                                         |
+|:--------------------------------|:-------------------------------|:-------------------------------------------------------------------------------|:------------------------------------------------------------------------------|
+| [Group-Equivariant Survey][18]  | Group Representation Theory    | Establishes the mathematical standard for $SO(3)$ and Lorentz group networks.  | Abstract theoretical nature; few practical code implementations provided.     |
+| [NequIP / MACE (2021/2024)][15] | $E(3)$-Equivariant Potentials  | Unprecedented data efficiency; captures higher-order multi-body interactions.  | Complex implementation of tensor products can lead to high inference latency. |
+| [DeepH-E3 (2023)][16]           | Equivariant DFT Hamiltonian    | Preserves Euclidean symmetry for supercells $>10^4$ atoms; ab initio accuracy. | Significant memory consumption for high-rank tensor representations.          |
+| [Atomic-site ENN (2024)][15]    | Lattice Symmetry-Aware         | Bridges microscopic electronic processes to mesoscale behavior in solids.      | Computationally intensive for very large supercells.                          |
+| [QHNet (2023)][19]              | Efficient $SE(3)$-Equivariance | Reduces the number of tensor products by 92% compared to previous SOTA.        | Trade-off between structural simplicity and expressive capacity.              |
+
+#### Hamiltonian Networks
+
+Physical systems are governed by fundamental conservation laws (energy,
+momentum, mass). In this subcategory, the architecture moves from predicting a
+field $u$ to predicting a scalar Hamiltonian or energy potential $H$. By
+embedding the symplectic structure of physics into the layers, the model cannot
+produce a result which violates the conserved property because the final output
+is derived by the predicted energy surface. This ensures that the system stays
+on a valid physical manifold during long-term time-series simulations, avoiding
+the "explosion" of values common in black-box models.
+
+Unified Formulation: $\partial$ is a Conservation Operator (like the Gradient or
+Curl). The model is forced to output a scalar "Energy" first, and the actual
+physical state is derived by taking the derivative of that energy.
+
+| Paper Reference                           | Core Innovation               | Key Strengths                                                            | Identified Weaknesses                                                        |
+|:------------------------------------------|:------------------------------|:-------------------------------------------------------------------------|:-----------------------------------------------------------------------------|
+| [Neural Hamiltonian Diffusion (2025)][20] | Manifold Hamiltonian Learning | Unifies stochastic diffusion and Hamiltonian mechanics on curved spaces. | Requires a-priori knowledge of the Riemannian manifold's metric.             |
+| [Deep Potentials (2021)][17]              | Density-based Descriptors     | High-speed molecular dynamics with quantum mechanical fidelity.          | Struggles with systems undergoing chemical reactions (bond breaking).        |
+| [SpinGNN (2025)][17]                      | Heisenberg/Spin-Lattice GNN   | Preserves symmetries of exchange and spin-lattice couplings for magnets. | Specialized architecture that lacks general-purpose utility for soft matter. |
+| [Heisenberg Edge GNN][17]                 | Equivariant Message Passing   | Specifically captures tensorial quantities like spin Hall conductivity.  | Performance is sensitive to the cutoff radius for atomic interactions.       |
+
+#### Basis-Expansion Networks
+
+Complexity in physics often arises from the interaction of multiple particles
+becoming exponentially difficult to calculate. Rather than forcing a neural
+network to learn these complex interactions from raw data, Basis-Expansion
+Networks limit the network’s "vocabulary" to a set of physically proven
+templates. By projecting the problem onto a mathematically complete basis set (
+like Atomic Cluster Expansion), the network only needs to learn the weights of
+these basis functions. This turns the neural network into a "Neural Code" or a
+differentiable version of a classical physics solver, combining the flexibility
+of AI with the rigor of analytical physics.
+
+Unified Formulation: $\partial$ is a Projection Operator. The network $u_\theta$
+is a weighted sum of physical templates:
+$$u_\theta=\sum_i w_i \phi_i$$
+where $\phi_i$ are fixed, physically valid functions.
+
+| Paper Reference            | Core Innovation              | Key Strengths                                                                   | Identified Weaknesses                                                            |
+|:---------------------------|:-----------------------------|:--------------------------------------------------------------------------------|:---------------------------------------------------------------------------------|
+| [ACE Framework (2024)][15] | Atomic Cluster Expansion     | Hierarchical basis for symmetry-adapted invariants; mathematically complete.    | Steep learning curve for researchers not versed in group representation theory.  |
+| [AI2DFT (2024)][22]        | Differential DFT Neural Code | First unsupervised physics-informed learning framework for DFT quantities.      | Stability depends on the quality of the variational energy functional.           |
+| [Timrov et al. (2025)][21] | Hubbard Parameter ENN        | Speeds up Hubbard $U$ and $V$ calculations via equivariant occupation matrices. | Transferability is high but confined to the specific lattice structures trained. |
+
+### **Implications of Hard Constraints on Emergent Behavior**
+
+The integration of Hamiltonian mechanics into neural architectures (Hamiltonian
+Neural Networks or HNNs) structurally guarantees energy conservation, a feat
+that is nearly impossible for data-embedded PINN models over long simulation
+times.[20][20] By deriving the dynamics from a learned scalar Hamiltonian
+function $H_\theta$, the model respects the symplectic structure of phase space,
+preventing the "energy drift" commonly seen in standard RNNs or Transformers
+used for physics simulation.[20][20]
+
+A deep insight from recent ENN literature is the realization that strict
+equivariance might be too restrictive for certain "broken symmetry" systems.
+This has led to the development of relaxed-symmetry models that allow for small,
+learnable deviations from perfect equivariance, which is critical for modeling
+materials under stress or in non-equilibrium states.[17] Furthermore, the move
+toward "unsupervised" learning in models like AI2DFT suggests that the
+variational principles of physics (like minimizing total energy) can serve as
+the ultimate loss function, potentially bypassing the need for labeled DFT data
+entirely.[22]
+
+#### **Works cited**
+
+[0]: https://www.researchgate.net/publication/391540378_When_physics_meets_machine_learning_a_survey_of_physics-informed_machine_learning
+
+[1]: https://arxiv.org/html/2408.09840v2
+
+[2]: https://arxiv.org/html/2506.13777v1
+
+[3]: https://arxiv.org/html/2501.06572v1
+
+[4]: https://www.ejpam.com/ejpam/article/view/6334/2750
+
+[5]: https://www.mdpi.com/2227-7390/13/20/3289
+
+[6]: https://www.articsledge.com/post/physics-informed-neural-networks-pinns
+
+[7]: https://www.researchgate.net/publication/388357372_A_Review_of_Physics-Informed_Neural_Networks
+
+[8]: https://towardsdatascience.com/essential-review-papers-on-physics-informed-neural-networks-a-curated-guide-for-practitioners/
+
+[9]: https://ieeexplore.ieee.org/iel8/8784343/10845831/10843279.pdf
+
+[10]: https://arxiv.org/html/2408.06650v1
+
+[11]: https://github.com/Event-AHU/PINN_Paper_List
+
+[12]: https://www.frontiersin.org/journals/artificial-intelligence/articles/10.3389/frai.2026.1717117/full
+
+[13]: https://www.emergentmind.com/topics/physics-inspired-kolmogorov-arnold-network-pikan
+
+[14]: https://arxiv.org/html/2601.04104v1
+
+[15]: https://pmc.ncbi.nlm.nih.gov/articles/PMC10199065/
+
+[16]: https://www.oaepublish.com/articles/jmi.2025.17
+
+[17]: https://pmc.ncbi.nlm.nih.gov/articles/PMC12541325/
+
+[18]: https://proceedings.mlr.press/v202/yu23i/yu23i.pdf
+
+[19]: https://neurips.cc/virtual/2025/poster/117646
+
+[20]: https://communities.springernature.com/posts/machine-learning-hubbard-parameters-with-equivariant-neural-networks
+
+[21]: https://arxiv.org/pdf/2403.11287
+
+[22]: https://www.researchgate.net/publication/394511831_Gaussian_Splashing_Unified_Particles_for_Versatile_Motion_Synthesis_and_Rendering
+
+[23]: https://arxiv.org/html/2512.24986v1
+
+[24]: https://www.researchgate.net/publication/382119336_A_Review_of_Differentiable_Simulators
+
+[25]: https://www.emergentmind.com/topics/differentiable-simulation-engines
+
+[26]: https://arxiv.org/html/2203.00806v5
+
+[27]: https://www.semanticscholar.org/paper/A-Review-of-Differentiable-Simulators-Newbury-Collins/b3a10024b9ad159a6dc68d3acce36dffc464dd67
+
+[28]: https://www.azooptics.com/News.aspx?newsID=30558
+
+[29]: https://www.researchgate.net/publication/372961478_Spatially_Varying_Nanophotonic_Neural_Networks
+
+[30]: https://pubs.acs.org/doi/10.1021/acsphotonics.4c01874
+
+[31]: https://www.mdpi.com/2304-6732/12/12/1187
+
+[32]: https://pmc.ncbi.nlm.nih.gov/articles/PMC12758549/
+
+[33]: https://opg.optica.org/oe/abstract.cfm?uri=oe-34-2-2197
+
+[34]: https://www.spiedigitallibrary.org/conference-proceedings-of-spie/PC13585.toc
+
+[35]: https://www.spiedigitallibrary.org/conference-proceedings-of-spie/0/PC139061/Diffractive-deep-neural-networks-with-multimode-fibers/10.1117/12.3079670.full
+
+[36]: https://opg.optica.org/optcon/fulltext.cfm?uri=optcon-4-8-1810
+
+[37]: https://www.spiedigitallibrary.org/journals/advanced-photonics-nexus/volume-4/issue-2/026009/Compressed-meta-optical-encoder-for-image-classification/10.1117/1.APN.4.2.026009.pdf
+
+[38]: https://www.researchgate.net/publication/339555300_Deep_Tensor_ADMM-Net_for_Snapshot_Compressive_Imaging
diff --git a/02-collated-results-voice.md b/02-collated-results-voice.md
new file mode 100644
index 0000000..c9093a8
--- /dev/null
+++ b/02-collated-results-voice.md
@@ -0,0 +1,203 @@
+## 3.2. Architecture-Embedded Physics: Hard Constraints and Geometric Deep Learning
+
+While data- and loss-embedded physics (Section 3.1) offer a flexible means to
+encourage physical plausibility, they fundamentally rely on soft penalties. This
+reliance introduces severe optimization challenges, as the network must
+simultaneously balance data fitting with the minimization of PDE residuals.
+Architecture-embedded physics addresses these failure modes by transitioning
+from "soft" optimization penalties to "hard" structural constraints. Instead of
+relying on the loss landscape to steer the model toward physical reality, this
+paradigm directly bakes invariances, symmetries, and conservation laws into the
+network's internal topology [-].
+
+### 3.2.1. Diagnosis: Coordinate Bias and the Failure of Data Augmentation
+
+In scientific domains, physical systems are defined by their geometric structure
+and inherent symmetries. For instance, the physical forces acting on a molecule
+must rotate exactly as the molecule rotates in 3D space. Standard Multi-Layer
+Perceptrons (MLPs) and Convolutional Neural Networks (CNNs) are fundamentally
+coordinate-dependent; they possess no structural awareness of Euclidean
+symmetries. Consequently, when mapping a spatial input to a physical property,
+standard architectures fail to commute with symmetry operators.
+
+To demonstrate this formally, let $\mathcal T_g$ represent a spatial
+transformation operator corresponding to a continuous group element
+$g \in SE(3)$ (such as a 3D rotation or translation). For a standard black-box
+neural network $u_\theta$, applying the physical transformation to the input
+coordinates $x$ prior to the forward pass does not yield the same result as
+applying the transformation to the network's predicted output. Mathematically,
+the operations do not commute:
+
+$$u_\theta(\mathcal T_g x, t) \ne \mathcal T_g u_\theta(x, t)$$
+
+Because this standard mapping is completely blind to the underlying symmetry
+group, the network is forced to learn the fundamental rules of geometric physics
+entirely from scratch. To compensate for this coordinate bias, classical Deep
+Learning relies heavily on data augmentation—training the model on thousands of
+artificially rotated or translated examples. However, this approach is
+computationally wasteful, limits data efficiency, and only approximates
+symmetry, leaving the model vulnerable to out-of-distribution geometric
+orientations.
+
+#### Remedy: Equivariant Tensor Networks
+
+Equivariant Tensor Networks resolve this representation failure by
+architecturally restricting the neural mapping such that its internal feature
+representations transform exactly according to the underlying symmetry group,
+such as $E(3)$ (Euclidean) or $SO(3)$ (Rotation) [-].
+
+By treating the general physical operator $\mathcal O$ as a symmetry
+operator $\mathcal T_g$, the network $u_\theta$ is structurally constrained to
+natively satisfy equivariance:
+
+$$u_\theta(\mathcal T_g x, t) = \mathcal T_g u_\theta (x, t)$$
+
+This inductive bias renders the model coordinate-blind, leading to exceptional
+data efficiency and robustness. State-of-the-art models in this category, such
+as DeepH-E3 and NequIP, leverage this exact equivariance to predict electronic
+Hamiltonians and interatomic potentials with sub-meV accuracy, outperforming
+traditional solvers while eliminating the need for geometric data
+augmentation [-].
+
+### 3.2.2. Diagnosis: Energy Drift in Long-Horizon Rollouts
+
+When standard autoregressive models (e.g., RNNs, standard Transformers) are used
+to simulate physical dynamics, they typically predict the vector field of the
+next state directly from the current state [-]. Because these
+black-box architectures possess no inherent concept of conservation laws (like
+energy, momentum, or mass), local approximation errors inevitably accumulate
+over sequential time steps.
+
+Let a physical state space be defined by its coordinates and
+momenta $x = (q, p)$. A standard network attempts to learn the time derivative
+directly:
+
+$$u_\theta = f_\theta(x, t) = dx \over dt$$
+
+When numerically integrated over $N$ discrete time steps, the predicted
+trajectory becomes:
+
+$$
+x_N = x_0 + \sum_{k=0}^{N-1} \left[ u_\theta \delta t + \epsilon_k \right]
+$$
+
+Because the learned mapping is unconstrained, its Jacobian is not guaranteed to
+be symplectic. As a result the flow is not volume-preserving in phase
+space ($\nabla f \ne 0$) and the local errors $\epsilon_k$ compound. This energy
+drift causes the simulated system to depart from the valid physical manifold,
+often resulting in non-physical behavior or numerical explosions during
+long-duration simulations [-].
+
+#### Remedy: Hamiltonian Neural Networks (HNNs)
+
+Hamiltonian Networks restructure the learning problem to strictly preserve
+physical manifolds. Instead of predicting the state vector directly, the
+architecture is designed to predict a scalar Hamiltonian (or total energy
+potential) $H(q, p)$ [-]. The actual physical state is then derived analytically
+by taking the symplectic gradient of that predicted energy surface.
+
+The model governs the system's dynamics through Hamilton's equations:
+
+$$\frac{dq}{dt} = \frac{\partial H_\theta}{\partial p}, \quad \frac{dp}{dt} = -\frac{\partial H_\theta}{\partial q}$$
+
+Because the model's final dynamical outputs are strictly derived from the
+orthogonal gradients of a single scalar field, the vector field is perfectly
+conservative by definition. This structural integration of symplectic mechanics
+guarantees energy conservation over indefinite rollout horizons, a feat that is
+nearly impossible for purely data-embedded PINN models [-].
+
+### 3.2.3. Diagnosis: High Dimensionality of Multi-Body Problems
+
+Complexity in atomic and multi-scale physics often arises from the interaction
+of multiple particles, which scales combinatorially. Standard fully connected
+networks struggle to capture these complex, higher-order interaction patterns
+from raw positional data without requiring exponentially large parameter
+counts [-].
+
+Consider a macroscopic physical property $U$ of a system containing $N$
+particles at
+coordinates ${r_n}$. A complete description requires many-body expansion:
+
+$$U(r) = U_0 + \sum_i U_1(r_i) + \sum_{i < j} U_2(r_i, r_j) + \sum_{i < j < k}
+U_3(r_i, r_j, r_k) + \cdots
+
+Where $U_n$ represents the exact $n$-body interaction term. The number of
+discrete combinations required to evaluate an expanded $n$-body interaction
+scales combinatorially as $\binom{N}{n} \sim O(N^n)$. For
+a standard MLP that flattens the input into a single $3N$-dimensional vector,
+implicitly learning these $n$-order spatial correlations requires dense weight
+matrices whose parameter counts explode exponentially with the complexity of the
+physical environment.
+
+Forcing a neural network to learn these complex interactions completely from
+scratch typically results in overparameterization, poor generalization, and a
+complete lack of physical interpretability [-].
+
+#### Remedy: Basis-Expansion Networks
+
+Rather than relying on generic weight matrices to learn multi-body physics,
+Basis-Expansion Networks limit the network’s representation space to a strict
+basis of physically proven templates $\phi_i$. By projecting the problem onto
+a mathematically complete basis set (such as the Atomic Cluster Expansion [-]),
+the neural network only needs to learn the coefficients for these basis
+functions.
+
+Treating $\mathcal O$ as a Projection Operator, the network $u_\theta$ acts as a
+weighted sum of physical basis functions:
+
+$$u_\theta = f_\theta(\mathcal O(x, t)) = \sum_i w_{i \theta} \phi_i (x, t)$$
+
+where $f_\theta$ is the learnable neural mapping and $\phi_i$ are the analytical
+basis [-].
+
+### 3.2.4. Implications and Relaxed Constraints
+
+A deep insight emerging from recent ENN literature is the realization that
+strict, hard-coded equivariance might actually be *too* restrictive for certain
+physical systems, particularly those exhibiting "broken symmetry" [-]. This
+diagnostic realization has led to the development of relaxed-symmetry
+models [-]. These architectures allow for small, learnable deviations from
+perfect mathematical equivariance, providing the structural flexibility required
+to model materials under extreme stress or in non-equilibrium states without
+completely abandoning the physical prior. Furthermore, the move toward
+unsupervised learning in differentiable solvers like AI2DFT suggests that
+variational principles of physics can ultimately serve as both the loss function
+and the architectural constraint, potentially bypassing the need for labeled
+numerical data entirely [-].
+
+| Type                     | Method                            | Venue / Year | Keyword-style contribution |
+|:-------------------------|:----------------------------------|--------------|:---------------------------|
+| Equivariant Networks     | [NequIP][1]                       |              |                            |
+|                          | [MACE][2]                         |              |                            |
+|                          | [DeepH-E3][3]                     |              |                            |
+|                          | [QHNet][4]                        |              |                            |
+| Hamiltonian Networks     | [Neural Hamiltonian Diffusion][5] |              |                            |
+|                          | [DeePMD][6]                       |              |                            |
+|                          | [HEGNN][7]                        |              |                            |
+|                          | [SEGNN][7]                        |              |                            |
+| Basis Expansion Networks | [ACE Framework (2024)][8]         |              |                            |
+|                          | [AI2DFT (2024)][9]                |              |                            |
+
+[0a]: https://pmc.ncbi.nlm.nih.gov/articles/PMC12541325/  (review of equivariant networks)
+
+[0b]: https://arxiv.org/abs/2601.04104v1 (review of equivariant networks)
+
+[0c]: https://www.oaepublish.com/articles/jmi.2025.17 (review of hamiltonian networks)
+
+[1]: https://www.nature.com/articles/s41467-022-29939-5 (NequIP)
+
+[2]: https://arxiv.org/abs/2206.07697 (MACE)
+
+[3]: https://pmc.ncbi.nlm.nih.gov/articles/PMC10199065/ (DeepH-E3)
+
+[4]: https://proceedings.mlr.press/v202/yu23i (QHNet)
+
+[5]: https://neurips.cc/virtual/2025/poster/117646 (Neural Hamiltonian Diffusion)
+
+[6]: https://dx.doi.org/10.1016/j.cpc.2018.03.016 (DeePMD)
+
+[7]: https://dx.doi.org/10.1103/PhysRevB.109.144426 (SpinGNN)
+
+[8]: https://journals.aps.org/prb/abstract/10.1103/PhysRevB.99.014104 (ACE Framework)
+
+[9]: https://arxiv.org/abs/2403.11287 (AI2DFT)
diff --git a/03-collated-results-cited.md b/03-collated-results-cited.md
new file mode 100644
index 0000000..22a48a7
--- /dev/null
+++ b/03-collated-results-cited.md
@@ -0,0 +1,203 @@
+## 3.2. Architecture-Embedded Physics: Hard Constraints and Geometric Deep Learning
+
+While data- and loss-embedded physics (Section 3.1) offer a flexible means to
+encourage physical plausibility, they fundamentally rely on soft penalties. This
+reliance introduces severe optimization challenges, as the network must
+simultaneously balance data fitting with the minimization of PDE residuals.
+Architecture-embedded physics addresses these failure modes by transitioning
+from "soft" optimization penalties to "hard" structural constraints. Instead of
+relying on the loss landscape to steer the model toward physical reality, this
+paradigm directly bakes invariances, symmetries, and conservation laws into the
+network's internal topology [-].
+
+### 3.2.1. Coordinate Bias and Failure Modes of Data Augmentation
+
+In scientific domains, physical systems are defined by their geometric structure
+and inherent symmetries. For instance, the physical forces acting on a molecule
+must rotate exactly as the molecule rotates in 3D space. Standard Multi-Layer
+Perceptrons (MLPs) and Convolutional Neural Networks (CNNs) are fundamentally
+coordinate-dependent; they possess no structural awareness of Euclidean
+symmetries. Consequently, when mapping a spatial input to a physical property,
+standard architectures fail to commute with symmetry operators.
+
+To demonstrate this formally, let $\mathcal T_g$ represent a spatial
+transformation operator corresponding to a continuous group element
+$g \in SE(3)$ (such as a 3D rotation or translation). For a standard black-box
+neural network $u_\theta$, applying the physical transformation to the input
+coordinates $x$ prior to the forward pass does not yield the same result as
+applying the transformation to the network's predicted output. Mathematically,
+the operations do not commute:
+
+$$u_\theta(\mathcal T_g x, t) \ne \mathcal T_g u_\theta(x, t)$$
+
+Because this standard mapping is completely blind to the underlying symmetry
+group, the network is forced to learn the fundamental rules of geometric physics
+entirely from scratch. To compensate for this coordinate bias, classical Deep
+Learning relies heavily on data augmentation—training the model on thousands of
+artificially rotated or translated examples. However, this approach is
+computationally wasteful, limits data efficiency, and only approximates
+symmetry, leaving the model vulnerable to out-of-distribution geometric
+orientations.
+
+#### Remedy: Equivariant Tensor Networks
+
+Equivariant Tensor Networks resolve this representation failure by
+architecturally restricting the neural mapping such that its internal feature
+representations transform exactly according to the underlying symmetry group,
+such as $E(3)$ (Euclidean) or $SO(3)$ (Rotation) [-].
+
+By treating the general physical operator $\mathcal O$ as a symmetry
+operator $\mathcal T_g$, the network $u_\theta$ is structurally constrained to
+natively satisfy equivariance:
+
+$$u_\theta(\mathcal T_g x, t) = \mathcal T_g u_\theta (x, t)$$
+
+This inductive bias renders the model coordinate-blind, leading to exceptional
+data efficiency and robustness. State-of-the-art models in this category, such
+as DeepH-E3 and NequIP, leverage this exact equivariance to predict electronic
+Hamiltonians and interatomic potentials with sub-meV accuracy, outperforming
+traditional solvers while eliminating the need for geometric data
+augmentation [-].
+
+### 3.2.2. Diagnosis: Energy Drift in Long-Horizon Rollouts
+
+When standard autoregressive models (e.g., RNNs, standard Transformers) are used
+to simulate physical dynamics, they typically predict the vector field of the
+next state directly from the current state [-]. Because these
+black-box architectures possess no inherent concept of conservation laws (like
+energy, momentum, or mass), local approximation errors inevitably accumulate
+over sequential time steps.
+
+Let a physical state space be defined by its coordinates and
+momenta $x = (q, p)$. A standard network attempts to learn the time derivative
+directly:
+
+$$u_\theta = f_\theta(x, t) = dx \over dt$$
+
+When numerically integrated over $N$ discrete time steps, the predicted
+trajectory becomes:
+
+$$
+x_N = x_0 + \sum_{k=0}^{N-1} \left[ u_\theta \delta t + \epsilon_k \right]
+$$
+
+Because the learned mapping is unconstrained, its Jacobian is not guaranteed to
+be symplectic. As a result the flow is not volume-preserving in phase
+space ($\nabla f \ne 0$) and the local errors $\epsilon_k$ compound. This energy
+drift causes the simulated system to depart from the valid physical manifold,
+often resulting in non-physical behavior or numerical explosions during
+long-duration simulations [-].
+
+#### Remedy: Hamiltonian Neural Networks (HNNs)
+
+Hamiltonian Networks restructure the learning problem to strictly preserve
+physical manifolds. Instead of predicting the state vector directly, the
+architecture is designed to predict a scalar Hamiltonian (or total energy
+potential) $H(q, p)$ [-]. The actual physical state is then derived analytically
+by taking the symplectic gradient of that predicted energy surface.
+
+The model governs the system's dynamics through Hamilton's equations:
+
+$$\frac{dq}{dt} = \frac{\partial H_\theta}{\partial p}, \quad \frac{dp}{dt} = -\frac{\partial H_\theta}{\partial q}$$
+
+Because the model's final dynamical outputs are strictly derived from the
+orthogonal gradients of a single scalar field, the vector field is perfectly
+conservative by definition. This structural integration of symplectic mechanics
+guarantees energy conservation over indefinite rollout horizons, a feat that is
+nearly impossible for purely data-embedded PINN models [-].
+
+### 3.2.3. Diagnosis: High Dimensionality of Multi-Body Problems
+
+Complexity in atomic and multi-scale physics often arises from the interaction
+of multiple particles, which scales combinatorially. Standard fully connected
+networks struggle to capture these complex, higher-order interaction patterns
+from raw positional data without requiring exponentially large parameter
+counts [-].
+
+Consider a macroscopic physical property $U$ of a system containing $N$
+particles at
+coordinates ${r_n}$. A complete description requires many-body expansion:
+
+$$U(r) = U_0 + \sum_i U_1(r_i) + \sum_{i < j} U_2(r_i, r_j) + \sum_{i < j < k}
+U_3(r_i, r_j, r_k) + \cdots
+
+Where $U_n$ represents the exact $n$-body interaction term. The number of
+discrete combinations required to evaluate an expanded $n$-body interaction
+scales combinatorially as $\binom{N}{n} \sim O(N^n)$. For
+a standard MLP that flattens the input into a single $3N$-dimensional vector,
+implicitly learning these $n$-order spatial correlations requires dense weight
+matrices whose parameter counts explode exponentially with the complexity of the
+physical environment.
+
+Forcing a neural network to learn these complex interactions completely from
+scratch typically results in overparameterization, poor generalization, and a
+complete lack of physical interpretability [-].
+
+#### Remedy: Basis-Expansion Networks
+
+Rather than relying on generic weight matrices to learn multi-body physics,
+Basis-Expansion Networks limit the network’s representation space to a strict
+basis of physically proven templates $\phi_i$. By projecting the problem onto
+a mathematically complete basis set (such as the Atomic Cluster Expansion [-]),
+the neural network only needs to learn the coefficients for these basis
+functions.
+
+Treating $\mathcal O$ as a Projection Operator, the network $u_\theta$ acts as a
+weighted sum of physical basis functions:
+
+$$u_\theta = f_\theta(\mathcal O(x, t)) = \sum_i w_{i \theta} \phi_i (x, t)$$
+
+where $f_\theta$ is the learnable neural mapping and $\phi_i$ are the analytical
+basis [-].
+
+### 3.2.4. Implications and Relaxed Constraints
+
+A deep insight emerging from recent ENN literature is the realization that
+strict, hard-coded equivariance might actually be *too* restrictive for certain
+physical systems, particularly those exhibiting "broken symmetry" [-]. This
+diagnostic realization has led to the development of relaxed-symmetry
+models [-]. These architectures allow for small, learnable deviations from
+perfect mathematical equivariance, providing the structural flexibility required
+to model materials under extreme stress or in non-equilibrium states without
+completely abandoning the physical prior. Furthermore, the move toward
+unsupervised learning in differentiable solvers like AI2DFT suggests that
+variational principles of physics can ultimately serve as both the loss function
+and the architectural constraint, potentially bypassing the need for labeled
+numerical data entirely [-].
+
+| Type                     | Method                            | Venue / Year | Keyword-style contribution |
+|:-------------------------|:----------------------------------|--------------|:---------------------------|
+| Equivariant Networks     | [NequIP][1]                       |              |                            |
+|                          | [MACE][2]                         |              |                            |
+|                          | [DeepH-E3][3]                     |              |                            |
+|                          | [QHNet][4]                        |              |                            |
+| Hamiltonian Networks     | [Neural Hamiltonian Diffusion][5] |              |                            |
+|                          | [DeePMD][6]                       |              |                            |
+|                          | [HEGNN][7]                        |              |                            |
+|                          | [SEGNN][7]                        |              |                            |
+| Basis Expansion Networks | [ACE Framework (2024)][8]         |              |                            |
+|                          | [AI2DFT (2024)][9]                |              |                            |
+
+[0a]: https://pmc.ncbi.nlm.nih.gov/articles/PMC12541325/  (review of equivariant networks)
+
+[0b]: https://arxiv.org/abs/2601.04104v1 (review of equivariant networks)
+
+[0c]: https://www.oaepublish.com/articles/jmi.2025.17 (review of hamiltonian networks)
+
+[1]: https://www.nature.com/articles/s41467-022-29939-5 (NequIP)
+
+[2]: https://arxiv.org/abs/2206.07697 (MACE)
+
+[3]: https://pmc.ncbi.nlm.nih.gov/articles/PMC10199065/ (DeepH-E3)
+
+[4]: https://proceedings.mlr.press/v202/yu23i (QHNet)
+
+[5]: https://neurips.cc/virtual/2025/poster/117646 (Neural Hamiltonian Diffusion)
+
+[6]: https://dx.doi.org/10.1016/j.cpc.2018.03.016 (DeePMD)
+
+[7]: https://dx.doi.org/10.1103/PhysRevB.109.144426 (SpinGNN)
+
+[8]: https://journals.aps.org/prb/abstract/10.1103/PhysRevB.99.014104 (ACE Framework)
+
+[9]: https://arxiv.org/abs/2403.11287 (AI2DFT)
diff --git a/citation-keys.txt b/citation-keys.txt
new file mode 100644
index 0000000..01e555d
--- /dev/null
+++ b/citation-keys.txt
@@ -0,0 +1,18 @@
+2025 enn-review-kon25
+2026 enn-review-fc26
+2025 hamiltonian-review
+
+Equivariant Networks
+2022 nequip
+2023 mace
+2023 deephe3
+2023 qhnet
+
+Hamiltonian Networks
+2026 ham-diff
+2018 deepmd
+2024 spingnn
+
+Basis Expansion Networks
+2019 ace
+2024 ai2dft
diff --git a/citations.bib b/citations.bib
new file mode 100644
index 0000000..b8aa0b9
--- /dev/null
+++ b/citations.bib
@@ -0,0 +1,183 @@
+@article{enn-review-kon25,
+	title = {The principles behind equivariant neural networks for physics and chemistry},
+	volume = {122},
+	issn = {0027-8424},
+	url = {https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12541325/},
+	doi = {10.1073/pnas.2415656122},
+	abstract = {A distinguishing feature of the neural network models used in Physics and Chemistry is that they must obey basic underlying symmetries, such as symmetry to translations, rotations, and the exchange of identical particles. Over the course of the last several years, the artificial neural networks community has developed a class of networks called group-equivariant neural nets that can efficiently “bake-in” such symmetries into the structure of the network itself. Equivariant neural nets leverage ideas from group representation theory and express all variables in the generalized Fourier space corresponding to the underlying group. In this article, we review this formalism and derive the general form of operations allowable in equivariant neural networks. Specifically, we discuss why the Clebsch–Gordan transform appears in such architectures, and how it can play the role of an equivariant nonlinearity.},
+	number = {41},
+	urldate = {2026-05-04},
+	year = {2025},
+	journal = {Proceedings of the National Academy of Sciences of the United States of America},
+	author = {Kondor, Risi},
+	pmid = {41052329},
+	pmcid = {PMC12541325},
+	pages = {e2415656122},
+}
+
+@misc{enn-review-fc26,
+	title = {Equivariant {Neural} {Networks} for {Force}-{Field} {Models} of {Lattice} {Systems}},
+	url = {http://arxiv.org/abs/2601.04104},
+	doi = {10.48550/arXiv.2601.04104},
+	abstract = {Machine-learning (ML) force fields enable large-scale simulations with near-first-principles accuracy at substantially reduced computational cost. Recent work has extended ML force-field approaches to adiabatic dynamical simulations of condensed-matter lattice models with coupled electronic and structural or magnetic degrees of freedom. However, most existing formulations rely on hand-crafted, symmetry-aware descriptors, whose construction is often system-specific and can hinder generality and transferability across different lattice Hamiltonians. Here we introduce a symmetry-preserving framework based on equivariant neural networks (ENNs) that provides a general, data-driven mapping from local configurations of dynamical variables to the associated on-site forces in a lattice Hamiltonian. In contrast to ENN architectures developed for molecular systems -- where continuous Euclidean symmetries dominate -- our approach aims to embed the discrete point-group and internal symmetries intrinsic to lattice models directly into the neural-network representation of the force field. As a proof of principle, we construct an ENN-based force-field model for the adiabatic dynamics of the Holstein Hamiltonian on a square lattice, a canonical system for electron-lattice physics. The resulting ML-enabled large-scale dynamical simulations faithfully capture mesoscale evolution of the symmetry-breaking phase, illustrating the utility of lattice-equivariant architectures for linking microscopic electronic processes to emergent dynamical behavior in condensed-matter lattice systems.},
+	urldate = {2026-05-04},
+	publisher = {arXiv},
+	author = {Fan, Yunhao and Chern, Gia-Wei},
+	month = jan,
+	year = {2026},
+	note = {arXiv:2601.04104 
+version: 1},
+	keywords = {Condensed Matter - Strongly Correlated Electrons, Computer Science - Machine Learning, Physics - Computational Physics},
+}
+
+@article{hamiltonian-review,
+	title = {A critical review of machine learning interatomic potentials and {Hamiltonian}},
+	volume = {5},
+	issn = {ISSN 2770-372X},
+	url = {https://www.oaepublish.com/articles/jmi.2025.17},
+	doi = {10.20517/jmi.2025.17},
+	abstract = {Machine learning interatomic potentials (ML-IAPs) and machine learning Hamiltonian (ML-Ham) have revolutionized atomistic and electronic structure simulations by offering near ab initio accuracy across extended time and length scales. In this Review, we summarize recent progress in these two fields, with emphasis on algorithmic and architectural innovations, geometric equivariance, data efficiency strategies, model-data co-design, and interpretable AI techniques. In addition, we discuss key challenges, including data fidelity, model generalizability, computational scalability, and explainability. Finally, we outline promising future directions, such as active learning, multi-fidelity frameworks, scalable message-passing architectures, and methods for enhancing interpretability, which is particularly crucial for the field of AI for Science (AI4S). The integration of these advances is expected to accelerate materials discovery and provide deeper mechanistic insights into complex material and physical systems.},
+	language = {en},
+	number = {4},
+	urldate = {2026-05-04},
+	journal = {Journal of Materials Informatics},
+	author = {Li, Yifan and Zhang, Xiuying and Shen, Lei},
+	month = jul,
+	year = {2025},
+	pages = {N/A--N/A},
+}
+
+@article{nequip,
+	title = {E(3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials},
+	volume = {13},
+	copyright = {2022 The Author(s)},
+	issn = {2041-1723},
+	url = {https://www.nature.com/articles/s41467-022-29939-5},
+	doi = {10.1038/s41467-022-29939-5},
+	abstract = {This work presents Neural Equivariant Interatomic Potentials (NequIP), an E(3)-equivariant neural network approach for learning interatomic potentials from ab-initio calculations for molecular dynamics simulations. While most contemporary symmetry-aware models use invariant convolutions and only act on scalars, NequIP employs E(3)-equivariant convolutions for interactions of geometric tensors, resulting in a more information-rich and faithful representation of atomic environments. The method achieves state-of-the-art accuracy on a challenging and diverse set of molecules and materials while exhibiting remarkable data efficiency. NequIP outperforms existing models with up to three orders of magnitude fewer training data, challenging the widely held belief that deep neural networks require massive training sets. The high data efficiency of the method allows for the construction of accurate potentials using high-order quantum chemical level of theory as reference and enables high-fidelity molecular dynamics simulations over long time scales.},
+	language = {en},
+	number = {1},
+	urldate = {2026-05-04},
+	journal = {Nature Communications},
+	author = {Batzner, Simon and Musaelian, Albert and Sun, Lixin and Geiger, Mario and Mailoa, Jonathan P. and Kornbluth, Mordechai and Molinari, Nicola and Smidt, Tess E. and Kozinsky, Boris},
+	month = may,
+	year = {2022},
+	keywords = {Atomistic models, Computational chemistry, Computational methods, Computer science, Molecular dynamics},
+	pages = {2453},
+}
+
+@misc{mace,
+	title = {{MACE}: {Higher} {Order} {Equivariant} {Message} {Passing} {Neural} {Networks} for {Fast} and {Accurate} {Force} {Fields}},
+	shorttitle = {{MACE}},
+	url = {http://arxiv.org/abs/2206.07697},
+	doi = {10.48550/arXiv.2206.07697},
+	abstract = {Creating fast and accurate force fields is a long-standing challenge in computational chemistry and materials science. Recently, several equivariant message passing neural networks (MPNNs) have been shown to outperform models built using other approaches in terms of accuracy. However, most MPNNs suffer from high computational cost and poor scalability. We propose that these limitations arise because MPNNs only pass two-body messages leading to a direct relationship between the number of layers and the expressivity of the network. In this work, we introduce MACE, a new equivariant MPNN model that uses higher body order messages. In particular, we show that using four-body messages reduces the required number of message passing iterations to just two, resulting in a fast and highly parallelizable model, reaching or exceeding state-of-the-art accuracy on the rMD17, 3BPA, and AcAc benchmark tasks. We also demonstrate that using higher order messages leads to an improved steepness of the learning curves.},
+	urldate = {2026-05-04},
+	publisher = {arXiv},
+	author = {Batatia, Ilyes and Kovács, Dávid Péter and Simm, Gregor N. C. and Ortner, Christoph and Csányi, Gábor},
+	month = jan,
+	year = {2023},
+	note = {arXiv:2206.07697},
+	keywords = {Statistics - Machine Learning, Condensed Matter - Materials Science, Computer Science - Machine Learning, Physics - Chemical Physics},
+}
+
+@article{deephe3,
+	title = {General framework for {E}(3)-equivariant neural network representation of density functional theory {Hamiltonian}},
+	volume = {14},
+	issn = {2041-1723},
+	url = {https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10199065/},
+	doi = {10.1038/s41467-023-38468-8},
+	abstract = {The combination of deep learning and ab initio calculation has shown great promise in revolutionizing future scientific research, but how to design neural network models incorporating a priori knowledge and symmetry requirements is a key challenging subject. Here we propose an E(3)-equivariant deep-learning framework to represent density functional theory (DFT) Hamiltonian as a function of material structure, which can naturally preserve the Euclidean symmetry even in the presence of spin–orbit coupling. Our DeepH-E3 method enables efficient electronic structure calculation at ab initio accuracy by learning from DFT data of small-sized structures, making the routine study of large-scale supercells ({\textgreater}104 atoms) feasible. The method can reach sub-meV prediction accuracy at high training efficiency, showing state-of-the-art performance in our experiments. The work is not only of general significance to deep-learning method development but also creates opportunities for materials research, such as building a Moiré-twisted material database., Fundamental symmetries are crucial to the deep-learning modeling of physical systems. Here the authors use equivariant neural networks preserving the Euclidean symmetries to accelerate electronic structure calculations by orders of magnitude keeping sub-meV accuracy.},
+	urldate = {2026-05-04},
+	journal = {Nature Communications},
+	author = {Gong, Xiaoxun and Li, He and Zou, Nianlong and Xu, Runzhang and Duan, Wenhui and Xu, Yong},
+	month = may,
+	year = {2023},
+	pmid = {37208320},
+	pmcid = {PMC10199065},
+	pages = {2848},
+}
+
+@inproceedings{qhnet,
+	title = {Efficient and {Equivariant} {Graph} {Networks} for {Predicting} {Quantum} {Hamiltonian}},
+	url = {https://proceedings.mlr.press/v202/yu23i.html},
+	abstract = {We consider the prediction of the Hamiltonian matrix, which finds use in quantum chemistry and condensed matter physics. Efficiency and equivariance are two important, but conflicting factors. In this work, we propose a SE(3)-equivariant network, named QHNet, that achieves efficiency and equivariance. Our key advance lies at the innovative design of QHNet architecture, which not only obeys the underlying symmetries, but also enables the reduction of number of tensor products by 92\%. In addition, QHNet prevents the exponential growth of channel dimension when more atom types are involved. We perform experiments on MD17 datasets, including four molecular systems. Experimental results show that our QHNet can achieve comparable performance to the state of the art methods at a significantly faster speed. Besides, our QHNet consumes 50\% less memory due to its streamlined architecture. Our code is publicly available as part of the AIRS library (https://github.com/divelab/AIRS).},
+	language = {en},
+	urldate = {2026-05-04},
+	booktitle = {Proceedings of the 40th {International} {Conference} on {Machine} {Learning}},
+	publisher = {PMLR},
+	author = {Yu, Haiyang and Xu, Zhao and Qian, Xiaofeng and Qian, Xiaoning and Ji, Shuiwang},
+	month = jul,
+	year = {2023},
+	pages = {40412--40424},
+}
+
+@inproceedings{ham-diff,
+    title={Neural Hamiltonian Diffusions for Modeling Structured Geometric Dynamics},
+    author={Sungwoo Park},
+    booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems},
+    year={2026},
+    url={https://openreview.net/forum?id=VswQY0peMr}
+}
+
+@article{deepmd,
+	title = {{DeePMD}-kit: {A} deep learning package for many-body potential energy representation and molecular dynamics},
+	volume = {228},
+	issn = {00104655},
+	shorttitle = {{DeePMD}-kit},
+	url = {https://linkinghub.elsevier.com/retrieve/pii/S0010465518300882},
+	doi = {10.1016/j.cpc.2018.03.016},
+	language = {en},
+	urldate = {2026-05-04},
+	journal = {Computer Physics Communications},
+	author = {Wang, Han and Zhang, Linfeng and Han, Jiequn and E, Weinan},
+	month = jul,
+	year = {2018},
+	pages = {178--184},
+}
+
+@article{spingnn,
+	title = {Spin-dependent graph neural network potential for magnetic materials},
+	volume = {109},
+	issn = {2469-9950, 2469-9969},
+	url = {https://link.aps.org/doi/10.1103/PhysRevB.109.144426},
+	doi = {10.1103/PhysRevB.109.144426},
+	language = {en},
+	number = {14},
+	urldate = {2026-05-04},
+	journal = {Physical Review B},
+	author = {Yu, Hongyu and Zhong, Yang and Hong, Liangliang and Xu, Changsong and Ren, Wei and Gong, Xingao and Xiang, Hongjun},
+	month = apr,
+	year = {2024},
+	pages = {144426},
+}
+
+@article{ace,
+	title = {Atomic cluster expansion for accurate and transferable interatomic potentials},
+	volume = {99},
+	issn = {2469-9950, 2469-9969},
+	url = {https://link.aps.org/doi/10.1103/PhysRevB.99.014104},
+	doi = {10.1103/PhysRevB.99.014104},
+	language = {en},
+	number = {1},
+	urldate = {2026-05-04},
+	journal = {Physical Review B},
+	author = {Drautz, Ralf},
+	month = jan,
+	year = {2019},
+	pages = {014104},
+}
+
+@misc{ai2dft,
+	title = {Neural-network {Density} {Functional} {Theory} {Based} on {Variational} {Energy} {Minimization}},
+	url = {http://arxiv.org/abs/2403.11287},
+	doi = {10.48550/arXiv.2403.11287},
+	abstract = {Deep-learning density functional theory (DFT) shows great promise to significantly accelerate material discovery and potentially revolutionize materials research. However, current research in this field primarily relies on data-driven supervised learning, making the developments of neural networks and DFT isolated from each other. In this work, we present a theoretical framework of neural-network DFT, which unifies the optimization of neural networks with the variational computation of DFT, enabling physics-informed unsupervised learning. Moreover, we develop a differential DFT code incorporated with deep-learning DFT Hamiltonian, and introduce algorithms of automatic differentiation and backpropagation into DFT, demonstrating the capability of neural-network DFT. The physics-informed neural-network architecture not only surpasses conventional approaches in accuracy and efficiency, but also offers a new paradigm for developing deep-learning DFT methods.},
+	urldate = {2026-05-04},
+	publisher = {arXiv},
+	author = {Li, Yang and Tang, Zechen and Chen, Zezhou and Sun, Minghui and Zhao, Boheng and Li, He and Tao, Honggeng and Yuan, Zilong and Duan, Wenhui and Xu, Yong},
+	month = aug,
+	year = {2024},
+	note = {arXiv:2403.11287},
+	keywords = {Physics - Computational Physics, Condensed Matter - Materials Science},
+}
diff --git a/collated.tex b/collated.tex
new file mode 100644
index 0000000..71d6b6c
--- /dev/null
+++ b/collated.tex
@@ -0,0 +1,73 @@
+\subsection{Architecture-Embedded Physics: Hard Constraints and Geometric Deep Learning}
+
+While data- and loss-embedded physics (Section 3.1) offer a flexible means to encourage physical plausibility, they fundamentally rely on soft penalties. This reliance introduces severe optimization challenges, as the network must simultaneously balance data fitting with the minimization of PDE residuals. Architecture-embedded physics addresses these failure modes by transitioning from ``soft'' optimization penalties to ``hard'' structural constraints. Instead of relying on the loss landscape to steer the model toward physical reality, this paradigm directly bakes invariances, symmetries, and conservation laws into the network's internal topology {[}-{]}.
+
+\subsubsection{Diagnosis: Coordinate Bias and the Failure of Data Augmentation}
+
+In scientific domains, physical systems are defined by their geometric structure and inherent symmetries. For instance, the physical forces acting on a molecule must rotate exactly as the molecule rotates in 3D space. Standard Multi-Layer Perceptrons (MLPs) and Convolutional Neural Networks (CNNs) are fundamentally coordinate-dependent; they possess no structural awareness of Euclidean symmetries. Consequently, when mapping a spatial input to a physical property, standard architectures fail to commute with symmetry operators.
+
+To demonstrate this formally, let \(\mathcal T_g\) represent a spatial transformation operator corresponding to a continuous group element \(g \in SE(3)\) (such as a 3D rotation or translation). For a standard black-box neural network \(u_\theta\), applying the physical transformation to the input coordinates \(x\) prior to the forward pass does not yield the same result as applying the transformation to the network's predicted output. Mathematically, the operations do not commute:
+
+\[u_\theta(\mathcal T_g x, t) \ne \mathcal T_g u_\theta(x, t)\]
+
+Because this standard mapping is completely blind to the underlying symmetry group, the network is forced to learn the fundamental rules of geometric physics entirely from scratch. To compensate for this coordinate bias, classical Deep Learning relies heavily on data augmentation---training the model on thousands of artificially rotated or translated examples. However, this approach is computationally wasteful, limits data efficiency, and only approximates symmetry, leaving the model vulnerable to out-of-distribution geometric orientations.
+
+\paragraph{Remedy: Equivariant Tensor Networks}\label{remedy-equivariant-tensor-networks}
+
+Equivariant Tensor Networks resolve this representation failure by architecturally restricting the neural mapping such that its internal feature representations transform exactly according to the underlying symmetry group, such as \(E(3)\) (Euclidean) or \(SO(3)\) (Rotation) {[}-{]}.
+
+By treating the general physical operator \(\mathcal O\) as a symmetry operator \(\mathcal T_g\), the network \(u_\theta\) is structurally constrained to natively satisfy equivariance:
+
+\[u_\theta(\mathcal T_g x, t) = \mathcal T_g u_\theta (x, t)\]
+
+This inductive bias renders the model coordinate-blind, leading to exceptional data efficiency and robustness. State-of-the-art models in this category, such as DeepH-E3 and NequIP, leverage this exact equivariance to predict electronic Hamiltonians and interatomic potentials with sub-meV accuracy, outperforming traditional solvers while eliminating the need for geometric data augmentation {[}-{]}.
+
+\subsubsection{3.2.2. Diagnosis: Energy Drift in Long-Horizon Rollouts}\label{diagnosis-energy-drift-in-long-horizon-rollouts}
+
+When standard autoregressive models (e.g., RNNs, standard Transformers) are used to simulate physical dynamics, they typically predict the vector field of the next state directly from the current state {[}-{]}. Because these black-box architectures possess no inherent concept of conservation laws (like energy, momentum, or mass), local approximation errors inevitably accumulate over sequential time steps.
+
+Let a physical state space be defined by its coordinates and momenta \(x = (q, p)\). A standard network attempts to learn the time derivative directly:
+
+\[u_\theta = f_\theta(x, t) = dx \over dt\]
+
+When numerically integrated over \(N\) discrete time steps, the predicted trajectory becomes:
+
+\[ x_N = x_0 + \sum_{k=0}^{N-1} \left[ u_\theta \delta t + \epsilon_k \right] \]
+
+Because the learned mapping is unconstrained, its Jacobian is not guaranteed to be symplectic. As a result the flow is not volume-preserving in phase space (\(\nabla f \ne 0\)) and the local errors \(\epsilon_k\) compound. This energy drift causes the simulated system to depart from the valid physical manifold, often resulting in non-physical behavior or numerical explosions during long-duration simulations {[}-{]}.
+
+\paragraph{Remedy: Hamiltonian Neural Networks (HNNs)}\label{remedy-hamiltonian-neural-networks-hnns}
+
+Hamiltonian Networks restructure the learning problem to strictly preserve physical manifolds. Instead of predicting the state vector directly, the architecture is designed to predict a scalar Hamiltonian (or total energy potential) \(H(q, p)\) {[}-{]}. The actual physical state is then derived analytically by taking the symplectic gradient of that predicted energy surface.
+
+The model governs the system's dynamics through Hamilton's equations:
+
+\[\frac{dq}{dt} = \frac{\partial H_\theta}{\partial p}, \quad \frac{dp}{dt} = -\frac{\partial H_\theta}{\partial q}\]
+
+Because the model's final dynamical outputs are strictly derived from the orthogonal gradients of a single scalar field, the vector field is perfectly conservative by definition. This structural integration of symplectic mechanics guarantees energy conservation over indefinite rollout horizons, a feat that is nearly impossible for purely data-embedded PINN models {[}-{]}.
+
+\subsubsection{3.2.3. Diagnosis: High Dimensionality of Multi-Body Problems}\label{diagnosis-high-dimensionality-of-multi-body-problems}
+
+Complexity in atomic and multi-scale physics often arises from the interaction of multiple particles, which scales combinatorially. Standard fully connected networks struggle to capture these complex, higher-order interaction patterns from raw positional data without requiring exponentially large parameter counts {[}-{]}.
+
+Consider a macroscopic physical property \(U\) of a system containing \(N\) particles at coordinates \({r_n}\). A complete description requires many-body expansion:
+
+\$\$U(r) = U\_0 + \sum\emph{i U\_1(r\_i) + \sum}\{i \textless{} j\} U\_2(r\_i, r\_j) + \sum\_\{i \textless{} j \textless{} k\} U\_3(r\_i, r\_j, r\_k) + \cdots
+
+Where \(U_n\) represents the exact \(n\)-body interaction term. The number of discrete combinations required to evaluate an expanded \(n\)-body interaction scales combinatorially as \(\binom{N}{n} \sim O(N^n)\). For a standard MLP that flattens the input into a single \(3N\)-dimensional vector, implicitly learning these \(n\)-order spatial correlations requires dense weight matrices whose parameter counts explode exponentially with the complexity of the physical environment.
+
+Forcing a neural network to learn these complex interactions completely from scratch typically results in overparameterization, poor generalization, and a complete lack of physical interpretability {[}-{]}.
+
+\paragraph{Remedy: Basis-Expansion Networks}\label{remedy-basis-expansion-networks}
+
+Rather than relying on generic weight matrices to learn multi-body physics, Basis-Expansion Networks limit the network's representation space to a strict basis of physically proven templates \(\phi_i\). By projecting the problem onto a mathematically complete basis set (such as the Atomic Cluster Expansion {[}-{]}), the neural network only needs to learn the coefficients for these basis functions.
+
+Treating \(\mathcal O\) as a Projection Operator, the network \(u_\theta\) acts as a weighted sum of physical basis functions:
+
+\[u_\theta = f_\theta(\mathcal O(x, t)) = \sum_i w_{i \theta} \phi_i (x, t)\]
+
+where \(f_\theta\) is the learnable neural mapping and \(\phi_i\) are the analytical basis {[}-{]}.
+
+\subsubsection{3.2.4. Implications and Relaxed Constraints}\label{implications-and-relaxed-constraints}
+
+A deep insight emerging from recent ENN literature is the realization that strict, hard-coded equivariance might actually be \emph{too} restrictive for certain physical systems, particularly those exhibiting ``broken symmetry'' {[}-{]}. This diagnostic realization has led to the development of relaxed-symmetry models {[}-{]}. These architectures allow for small, learnable deviations from perfect mathematical equivariance, providing the structural flexibility required to model materials under extreme stress or in non-equilibrium states without completely abandoning the physical prior. Furthermore, the move toward unsupervised learning in differentiable solvers like AI2DFT suggests that variational principles of physics can ultimately serve as both the loss function and the architectural constraint, potentially bypassing the need for labeled numerical data entirely {[}-{]}.
diff --git a/part-1.bib b/part-1.bib
new file mode 100644
index 0000000..182ffab
--- /dev/null
+++ b/part-1.bib
@@ -0,0 +1,475 @@
+
+@article{raissi2019pinn,
+  title={Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations},
+  author={Raissi, Maziar and Perdikaris, Paris and Karniadakis, George E},
+  journal={Journal of Computational physics},
+  volume={378},
+  pages={686--707},
+  year={2019},
+  publisher={Elsevier}
+}
+
+@article{yu2018deepritz,
+  title={The deep Ritz method: a deep learning-based numerical algorithm for solving variational problems},
+  author={Yu, Bing and others},
+  journal={Communications in Mathematics and Statistics},
+  volume={6},
+  number={1},
+  pages={1--12},
+  year={2018},
+  publisher={Springer}
+}
+
+@article{kharazmi2019vpinn,
+  title={Variational physics-informed neural networks for solving partial differential equations},
+  author={Kharazmi, Ehsan and Zhang, Zhongqiang and Karniadakis, George Em},
+  journal={arXiv preprint arXiv:1912.00873},
+  year={2019}
+}
+
+
+@article{kharazmi2021hpvpinn,
+  title={hp-VPINNs: Variational physics-informed neural networks with domain decomposition},
+  author={Kharazmi, Ehsan and Zhang, Zhongqiang and Karniadakis, George Em},
+  journal={Computer Methods in Applied Mechanics and Engineering},
+  volume={374},
+  pages={113547},
+  year={2021},
+  publisher={Elsevier}
+}
+
+@article{jagtap2020conservative,
+  title={Conservative physics-informed neural networks on discrete domains for conservation laws: Applications to forward and inverse problems},
+  author={Jagtap, Ameya D and Kharazmi, Ehsan and Karniadakis, George Em},
+  journal={Computer Methods in Applied Mechanics and Engineering},
+  volume={365},
+  pages={113028},
+  year={2020},
+  publisher={Elsevier}
+}
+
+@article{wang2021gradientpathologies,
+  title={Understanding and mitigating gradient flow pathologies in physics-informed neural networks},
+  author={Wang, Sifan and Teng, Yujun and Perdikaris, Paris},
+  journal={SIAM Journal on Scientific Computing},
+  volume={43},
+  number={5},
+  pages={A3055--A3081},
+  year={2021},
+  publisher={SIAM}
+}
+
+@inproceedings{krishnapriyan2021failuremodes,
+  title={Characterizing possible failure modes in physics-informed neural networks},
+  author={Krishnapriyan, Aditi and Gholami, Amir and Zhe, Shandian and Kirby, Robert and Mahoney, Michael W},
+  journal={Advances in neural information processing systems},
+  volume={34},
+  pages={26548--26560},
+  year={2021}
+}
+
+@article{mcclenny2023sapinn,
+  title={Self-adaptive physics-informed neural networks},
+  author={McClenny, Levi D and Braga-Neto, Ulisses M},
+  journal={Journal of Computational Physics},
+  volume={474},
+  pages={111722},
+  year={2023},
+  publisher={Elsevier}
+}
+
+@article{wu2023adaptivesampling,
+  title={A comprehensive study of non-adaptive and residual-based adaptive sampling for physics-informed neural networks},
+  author={Wu, Chenxi and Zhu, Min and Tan, Qinyang and Kartha, Yadhu and Lu, Lu},
+  journal={Computer Methods in Applied Mechanics and Engineering},
+  volume={403},
+  pages={115671},
+  year={2023},
+  publisher={Elsevier}
+}
+
+@article{wang2024causality,
+  title={Respecting causality for training physics-informed neural networks},
+  author={Wang, Sifan and Sankaran, Shyam and Perdikaris, Paris},
+  journal={Computer Methods in Applied Mechanics and Engineering},
+  volume={421},
+  pages={116813},
+  year={2024},
+  publisher={Elsevier}
+}
+
+@article{wang2025gradientalignment,
+  title={Gradient alignment in physics-informed neural networks: A second-order optimization perspective},
+  author={Wang, Sifan and Bhartari, Ananyae Kumar and Li, Bowen and Perdikaris, Paris},
+  journal={arXiv preprint arXiv:2502.00604},
+  year={2025}
+}
+
+@article{courant1994variational,
+  title={Variational methods for the solution of problems of equilibrium and vibrations},
+  author={Courant, Richard and others},
+  journal={Lecture notes in pure and applied mathematics},
+  pages={1--1},
+  year={1994},
+  publisher={MARCEL DEKKER AG}
+}
+
+@book{leveque2002finite,
+  title={Finite volume methods for hyperbolic problems},
+  author={LeVeque, Randall J},
+  volume={31},
+  year={2002},
+  publisher={Cambridge university press}
+}
+
+@book{patankar2018numerical,
+  title={Numerical heat transfer and fluid flow},
+  author={Patankar, Suhas},
+  year={2018},
+  publisher={CRC press}
+}
+
+@article{eshaghi2025variational,
+  title={Variational physics-informed neural operator (VINO) for solving partial differential equations},
+  author={Eshaghi, Mohammad Sadegh and Anitescu, Cosmin and Thombre, Manish and Wang, Yizheng and Zhuang, Xiaoying and Rabczuk, Timon},
+  journal={Computer Methods in Applied Mechanics and Engineering},
+  volume={437},
+  pages={117785},
+  year={2025},
+  publisher={Elsevier}
+}
+
+@article{rojas2024robust,
+  title={Robust variational physics-informed neural networks},
+  author={Rojas, Sergio and Maczuga, Pawe{\l} and Mu{\~n}oz-Matute, Judit and Pardo, David and Paszy{\'n}ski, Maciej},
+  journal={Computer Methods in Applied Mechanics and Engineering},
+  volume={425},
+  pages={116904},
+  year={2024},
+  publisher={Elsevier}
+}
+
+@article{zang2020weak,
+  title={Weak adversarial networks for high-dimensional partial differential equations},
+  author={Zang, Yaohua and Bao, Gang and Ye, Xiaojing and Zhou, Haomin},
+  journal={Journal of Computational Physics},
+  volume={411},
+  pages={109409},
+  year={2020},
+  publisher={Elsevier}
+}
+
+@inproceedings{baez2024guaranteeing,
+  title={Guaranteeing Conservation Laws with Projection in Physics-Informed Neural Networks},
+  author={Baez, Anthony and Zhang, Wang and Ma, Ziwen and Das, Subhro and Nguyen, Lam M and Daniel, Luca},
+  booktitle={NeurIPS 2024 Workshop on Data-driven and Differentiable Simulations, Surrogates, and Solvers}
+}
+
+@article{zhang2026musa,
+  title={MUSA-PINN: Multi-scale Weak-form Physics-Informed Neural Networks for Fluid Flow in Complex Geometries},
+  author={Zhang, Weizheng and Xie, Xunjie and Pan, Hao and Duan, Xiaowei and Sun, Bingteng and Du, Qiang and Lu, Lin},
+  journal={arXiv preprint arXiv:2603.08465},
+  year={2026}
+}
+
+@article{wang2025wf,
+  title={WF-PINNs: solving forward and inverse problems of burgers equation with steep gradients using weak-form physics-informed neural networks},
+  author={Wang, Xianke and Yi, Shichao and Gu, Huangliang and Xu, Jing and Xu, Wenjie},
+  journal={Scientific Reports},
+  volume={15},
+  number={1},
+  pages={40555},
+  year={2025},
+  publisher={Nature Publishing Group UK London}
+}
+
+@article{wang2022and,
+  title={When and why PINNs fail to train: A neural tangent kernel perspective},
+  author={Wang, Sifan and Yu, Xinling and Perdikaris, Paris},
+  journal={Journal of Computational Physics},
+  volume={449},
+  pages={110768},
+  year={2022},
+  publisher={Elsevier}
+}
+
+@inproceedings{rathore2024challenges,
+  title={Challenges in Training PINNs: A Loss Landscape Perspective},
+  author={Rathore, Pratik and Lei, Weimu and Frangella, Zachary and Lu, Lu and Udell, Madeleine},
+  booktitle={International Conference on Machine Learning},
+  pages={42159--42191},
+  year={2024},
+  organization={PMLR}
+}
+
+@article{wang2021eigenvector,
+  title={On the eigenvector bias of Fourier feature networks: From regression to solving multi-scale PDEs with physics-informed neural networks},
+  author={Wang, Sifan and Wang, Hanwen and Perdikaris, Paris},
+  journal={Computer Methods in Applied Mechanics and Engineering},
+  volume={384},
+  pages={113938},
+  year={2021},
+  publisher={Elsevier}
+}
+
+@article{wang2021understanding,
+  title={Understanding and mitigating gradient flow pathologies in physics-informed neural networks},
+  author={Wang, Sifan and Teng, Yujun and Perdikaris, Paris},
+  journal={SIAM Journal on Scientific Computing},
+  volume={43},
+  number={5},
+  pages={A3055--A3081},
+  year={2021},
+  publisher={SIAM}
+}
+
+@article{mcclenny2023self,
+  title={Self-adaptive physics-informed neural networks},
+  author={McClenny, Levi D and Braga-Neto, Ulisses M},
+  journal={Journal of Computational Physics},
+  volume={474},
+  pages={111722},
+  year={2023},
+  publisher={Elsevier}
+}
+
+@inproceedings{wu2026multi,
+  title={A Multi-Objective Optimization Framework for Adaptive Weighting in Physics-Informed Machine Learning},
+  author={Wu, Guoquan and Wu, Zhe},
+  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
+  volume={40},
+  number={32},
+  pages={26885--26893},
+  year={2026}
+}
+
+@inproceedings{wanggradient,
+  title={Gradient Alignment in Physics-informed Neural Networks: A Second-Order Optimization Perspective},
+  author={Wang, Sifan and Bhartari, Ananyae Kumar and Li, Bowen and Perdikaris, Paris},
+  booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems},
+  year={2025}
+}
+
+@article{banderwaar2025fast,
+  title={Fast PINN Eigensolvers via Biconvex Reformulation},
+  author={Banderwaar, Akshay Sai and Gupta, Abhishek},
+  journal={arXiv preprint arXiv:2511.00792},
+  year={2025}
+}
+
+@article{wu2023comprehensive,
+  title={A comprehensive study of non-adaptive and residual-based adaptive sampling for physics-informed neural networks},
+  author={Wu, Chenxi and Zhu, Min and Tan, Qinyang and Kartha, Yadhu and Lu, Lu},
+  journal={Computer Methods in Applied Mechanics and Engineering},
+  volume={403},
+  pages={115671},
+  year={2023},
+  publisher={Elsevier BV}
+}
+
+@inproceedings{
+duan2025copinn,
+title={Co{PINN}: Cognitive Physics-Informed Neural Networks},
+author={Siyuan Duan and Wenyuan Wu and Peng Hu and Zhenwen Ren and Dezhong Peng and Yuan Sun},
+booktitle={Forty-second International Conference on Machine Learning},
+year={2025},
+url={https://openreview.net/forum?id=4vAa0A98xI}
+}
+
+
+@article{wang2024respecting,
+  title={Respecting causality for training physics-informed neural networks},
+  author={Wang, Sifan and Sankaran, Shyam and Perdikaris, Paris},
+  journal={Computer Methods in Applied Mechanics and Engineering},
+  volume={421},
+  pages={116813},
+  year={2024},
+  publisher={Elsevier}
+}
+
+
+@article{cho2023separable,
+  title={Separable physics-informed neural networks},
+  author={Cho, Junwoo and Nam, Seungtae and Yang, Hyunmo and Yun, Seok-Bae and Hong, Youngjoon and Park, Eunbyung},
+  journal={Advances in Neural Information Processing Systems},
+  volume={36},
+  pages={23761--23788},
+  year={2023}
+}
+
+@inproceedings{
+zhao2024pinnsformer,
+title={{PINN}sFormer: A Transformer-Based Framework For Physics-Informed Neural Networks},
+author={Zhiyuan Zhao and Xueying Ding and B. Aditya Prakash},
+booktitle={The Twelfth International Conference on Learning Representations},
+year={2024},
+url={https://openreview.net/forum?id=DO2WFXU1Be}
+}
+
+@inproceedings{
+arni2025physicsinformed,
+title={Physics-Informed Neural Networks with Fourier Features and Attention-Driven Decoding},
+author={Rohan Arni and Carlos Blanco},
+booktitle={NeurIPS 2025 AI for Science Workshop},
+year={2025},
+url={https://openreview.net/forum?id=woq4ZAm1AH}
+}
+
+@article{tao2025xlstm,
+  title={xLSTM-PINN: Memory-Gated Spectral Remodeling for Physics-Informed Learning},
+  author={Tao, Ze and Zhao, Darui and Liu, Fujun and Xu, Ke and Hu, Xiangsheng},
+  journal={arXiv preprint arXiv:2511.12512},
+  year={2025}
+}
+
+@article{sitzmann2020implicit,
+  title={Implicit neural representations with periodic activation functions},
+  author={Sitzmann, Vincent and Martel, Julien and Bergman, Alexander and Lindell, David and Wetzstein, Gordon},
+  journal={Advances in neural information processing systems},
+  volume={33},
+  pages={7462--7473},
+  year={2020}
+}
+
+@article{jagtap2020extended,
+  title={Extended physics-informed neural networks (XPINNs): A generalized space-time domain decomposition based deep learning framework for nonlinear partial differential equations},
+  author={Jagtap, Ameya D and Karniadakis, George Em},
+  journal={Communications in Computational Physics},
+  volume={28},
+  number={5},
+  year={2020},
+  publisher={Brown Univ., Providence, RI (United States)}
+}
+
+@article{moseley2023finite,
+  title={Finite basis physics-informed neural networks (FBPINNs): a scalable domain decomposition approach for solving differential equations: B. Moseley et al.},
+  author={Moseley, Ben and Markham, Andrew and Nissen-Meyer, Tarje},
+  journal={Advances in Computational Mathematics},
+  volume={49},
+  number={4},
+  pages={62},
+  year={2023},
+  publisher={Springer}
+}
+
+@article{dolean2024multilevel,
+  title={Multilevel domain decomposition-based architectures for physics-informed neural networks},
+  author={Dolean, Victorita and Heinlein, Alexander and Mishra, Siddhartha and Moseley, Ben},
+  journal={Computer Methods in Applied Mechanics and Engineering},
+  volume={429},
+  pages={117116},
+  year={2024},
+  publisher={Elsevier}
+}
+
+@article{botvinick2025ab,
+  title={AB-PINNs: Adaptive-Basis Physics-Informed Neural Networks for Residual-Driven Domain Decomposition},
+  author={Botvinick-Greenhouse, Jonah and Ali, Wael H and Benosman, Mouhacine and Mowlavi, Saviz},
+  journal={arXiv preprint arXiv:2510.08924},
+  year={2025}
+}
+
+@article{bischof2025hypino,
+  title={HyPINO: Multi-Physics Neural Operators via HyperPINNs and the Method of Manufactured Solutions},
+  author={Bischof, Rafael and Piovar{\v{c}}i, Michal and Kraus, Michael A and Mishra, Siddhartha and Bickel, Bernd},
+  journal={arXiv preprint arXiv:2509.05117},
+  year={2025}
+}
+
+@article{wang2025transfer,
+  title={Transfer learning in physics-informed neurals networks: full fine-tuning, lightweight fine-tuning, and low-rank adaptation},
+  author={Wang, Yizheng and Bai, Jinshuai and Eshaghi, Mohammad Sadegh and Anitescu, Cosmin and Zhuang, Xiaoying and Rabczuk, Timon and Liu, Yinghua},
+  journal={International Journal of Mechanical System Dynamics},
+  volume={5},
+  number={2},
+  pages={212--235},
+  year={2025},
+  publisher={Wiley Online Library}
+}
+
+@article{chung2026hard,
+  title={Hard-constrained Physics-informed Neural Networks for Interface Problems},
+  author={Chung, Seung Whan and Castonguay, Stephen and Roy, Sumanta and Penwarden, Michael and Fu, Yucheng and Roy, Pratanu},
+  journal={arXiv preprint arXiv:2604.08453},
+  year={2026}
+}
+
+@article{li2024physical,
+  title={Physical informed neural networks with soft and hard boundary constraints for solving advection-diffusion equations using Fourier expansions},
+  author={Li, Xi'an and Deng, Jiaxin and Wu, Jinran and Zhang, Shaotong and Li, Weide and Wang, You-Gan},
+  journal={Computers \& Mathematics with Applications},
+  volume={159},
+  pages={60--75},
+  year={2024},
+  publisher={Elsevier}
+}
+
+@article{liu1989limited,
+  title={On the limited memory BFGS method for large scale optimization},
+  author={Liu, Dong C and Nocedal, Jorge},
+  journal={Mathematical programming},
+  volume={45},
+  number={1},
+  pages={503--528},
+  year={1989},
+  publisher={Springer}
+}
+
+@inproceedings{vyas2025soap,
+title={{SOAP}: Improving and Stabilizing Shampoo using Adam for Language Modeling},
+author={Nikhil Vyas and Depen Morwani and Rosie Zhao and Itai Shapira and David Brandfonbrener and Lucas Janson and Sham M. Kakade},
+booktitle={The Thirteenth International Conference on Learning Representations},
+year={2025},
+url={https://openreview.net/forum?id=IDxZhXrpNf}
+}
+
+@article{bischof2025multi,
+  title={Multi-objective loss balancing for physics-informed deep learning},
+  author={Bischof, Rafael and Kraus, Michael A},
+  journal={Computer Methods in Applied Mechanics and Engineering},
+  volume={439},
+  pages={117914},
+  year={2025},
+  publisher={Elsevier}
+}
+
+@article{wu2024ropinn,
+  title={Ropinn: Region optimized physics-informed neural networks},
+  author={Wu, Haixu and Luo, Huakun and Ma, Yuezhou and Wang, Jianmin and Long, Mingsheng},
+  journal={Advances in Neural Information Processing Systems},
+  volume={37},
+  pages={110494--110532},
+  year={2024}
+}
+
+@article{jagtap2020locally,
+  title={Locally adaptive activation functions with slope recovery for deep and physics-informed neural networks},
+  author={Jagtap, Ameya D and Kawaguchi, Kenji and Em Karniadakis, George},
+  journal={Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences},
+  volume={476},
+  number={2239},
+  year={2020},
+  publisher={The Royal Society}
+}
+
+@article{lagaris1998artificial,
+  title={Artificial neural networks for solving ordinary and partial differential equations},
+  author={Lagaris, Isaac E and Likas, Aristidis and Fotiadis, Dimitrios I},
+  journal={IEEE transactions on neural networks},
+  volume={9},
+  number={5},
+  pages={987--1000},
+  year={1998},
+  publisher={IEEE}
+}
+
+@article{lu2021deepxde,
+  title={DeepXDE: A deep learning library for solving differential equations},
+  author={Lu, Lu and Meng, Xuhui and Mao, Zhiping and Karniadakis, George Em},
+  journal={SIAM review},
+  volume={63},
+  number={1},
+  pages={208--228},
+  year={2021},
+  publisher={SIAM}
+}
\ No newline at end of file
diff --git a/part-1.tex b/part-1.tex
new file mode 100644
index 0000000..42397dc
--- /dev/null
+++ b/part-1.tex
@@ -0,0 +1,824 @@
+% ---------------------------------------------------------------------------
+% Author guideline and sample document for EG publication using LaTeX2e input
+% D.Fellner, v1.13, Jul 31, 2008
+
+\documentclass{egpubl-eurovis-star}
+\usepackage{eurovis2014-star}
+
+% --- for EuroVis
+%\WsSubmission    % uncomment for submission to EuroVis
+\WsPaper         % uncomment for final version of EuroVis contribution
+
+\electronicVersion % can be used both for the printed and electronic version
+
+% !! *please* don't change anything above
+% !! unless you REALLY know what you are doing
+% ------------------------------------------------------------------------
+
+% for including postscript figures
+% mind: package option 'draft' will replace PS figure by a filname within a frame
+\ifpdf \usepackage[pdftex]{graphicx} \pdfcompresslevel=9
+\else \usepackage[dvips]{graphicx} \fi
+
+\PrintedOrElectronic
+
+% prepare for electronic version of your document
+\usepackage{t1enc,dfadobe}
+
+\usepackage{egweblnk}
+\usepackage{cite}
+
+
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\usepackage{xspace}
+\newcommand{\etal}{et al.\xspace}
+\usepackage{amsfonts}
+\usepackage{amsmath}
+\usepackage{multirow}
+\usepackage{booktabs}
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+
+
+
+% For backwards compatibility to old LaTeX type font selection.
+% Uncomment if your document adheres to LaTeX2e recommendations.
+% \let\rm=\rmfamily    \let\sf=\sffamily    \let\tt=\ttfamily
+% \let\it=\itshape     \let\sl=\slshape     \let\sc=\scshape
+% \let\bf=\bfseries
+
+% end of prologue
+
+%\input{EGauthorGuidelines-body.inc} % commented by KK for ShareLaTeX use
+
+% ---------------------------------------------------------------------
+% EG author guidelines plus sample file for EG publication using LaTeX2e input
+% D.Fellner, v1.17, Sep 23, 2010
+
+
+\title[EG \LaTeX\ Author Guidelines]%
+      {Physics-Informed and Physics-Embedded Neural Methods for Visual Computing}
+
+% For anonymous conference submission, please enter your SUBMISSION ID.
+\author[submission ID]{Please provide your submission ID here}
+
+%% For the final version of your accepted paper, please enter the authors names and affiliations.
+%\author[D. Fellner \& S. Behnke]
+%       {D.\,W. Fellner\thanks{Chairman Eurographics Publications Board}$^{1,2}$
+%        and S. Behnke$^{2}$
+%        \\
+%         $^1$TU Darmstadt \& Fraunhofer IGD, Germany\\
+%         $^2$Institut f{\"u}r ComputerGraphik \& Wissensvisualisierung, TU Graz, Austria
+%       }
+
+% ------------------------------------------------------------------------
+
+% if the Editors-in-Chief have given you the data, you may uncomment
+% the following five lines and insert it here
+%
+% \volume{27}   % the volume in which the issue will be published;
+% \issue{1}     % the issue number of the publication
+% \pStartPage{1}      % set starting page
+
+
+%-------------------------------------------------------------------------
+\begin{document}
+
+% \teaser{
+%  \includegraphics[width=\linewidth]{eg_new}
+%  \centering
+%   \caption{New EG Logo}
+% \label{fig:teaser}
+% }
+
+\maketitle
+
+\begin{abstract}
+   The ABSTRACT is to be in fully-justified italicized text, 
+   between two horizontal lines,
+   in one-column format, 
+   below the author and affiliation information. 
+   Use the word ``Abstract'' as the title, in 9-point Times, boldface type, 
+   left-aligned to the text, initially capitalized. 
+   The abstract is to be in 9-point, single-spaced type.
+   The abstract may be up to 3 inches (7.62 cm) long. \\
+   Leave one blank line after the abstract, 
+   then add the subject categories according to the ACM Classification Index 
+   (see http://www.acm.org/class/1998/).
+
+\begin{classification} % according to http://www.acm.org/class/1998/
+\CCScat{Computer Graphics}{I.3.3}{Picture/Image Generation}{Line and curve generation}
+\end{classification}
+
+\end{abstract}
+
+
+
+
+
+%-------------------------------------------------------------------------
+\section{Writing Plan}
+
+1. Introduction
+
+2. Theoretical Foundations of Physics in Visual Computing
+
+3. Physics-Informed and Physics-Embedded Neural Methods
+\textcolor{red}{You can pick one of the following topics}
+\begin{itemize}
+    \item Data and Loss-embedded physics: PDE residual losses, initial-value and boundary-value constraints, other soft constraints, etc.\textcolor{red}{ --Han}
+    \item Architecture embedded physics: hard-coded invariances, physically parameterized layers, analytic kernels inside networks, etc.\textcolor{red}{-- David}
+    \item Operator embedded physics: differentiable renderers, wave propagation and light transport operators, neural and fourier neural operators, etc. \textcolor{red}{ --Andrea}
+    \item System embedded physics: hardware in the loop, ONNs, etc. \textcolor{red}{ --zhen}
+    \item Applications of PINNs\textcolor{red}{ --Ana}
+\end{itemize}
+
+4. Failure Modes and Misconceptions (Where do these methods work and do not work)
+
+5. Open Problems and Future Directions
+
+6. Discussion and Conclusion
+
+
+\clearpage
+\newpage
+\newpage
+%-------------------------------------------------------------------------
+
+
+
+
+\section{Data and Loss-embedded Physics}
+\label{sec:data_loss_embedded_physics}
+
+\subsection{Background}
+
+Physics-Informed Neural Networks (PINNs) combine traditional physics-based simulation with deep learning. Classical methods such as the Finite Element Method (FEM)~\cite{courant1994variational} and Finite Volume Method (FVM)~\cite{leveque2002finite,patankar2018numerical} solve physical equations by first dividing the physical domain into many small computational cells, like covering the space with a fine grid. These methods are accurate and reliable, but they can be expensive for complex geometries, moving boundaries, high-dimensional problems, or repeated simulations in design. Pure deep learning can be faster, but if it only learns from data, it may violate basic physical laws such as conservation of mass, momentum, or energy, making it unreliable when data are limited or test cases differ from training examples.
+
+PINNs address this by incorporating physical laws directly into neural network training. Instead of only fitting observed data, the model is also penalized when its predictions do not satisfy the governing differential equations. As a result, PINNs can learn solutions that are both data-efficient and physically meaningful. They also avoid the need for a predefined computational grid: rather than solving only on a fixed grid, PINNs learn a continuous function over space and time and use automatic differentiation to check the physical equations at sampled points. This makes them useful for irregular shapes, changing domains, limited measurements, and inverse problems where hidden physical parameters need to be estimated.
+
+
+\subsection{Formulation: embedding physics into the objective}
+
+Data and loss-embedded physics is a foundational paradigm for incorporating physical knowledge into neural computation by encoding governing equations and physical constraints directly into the training objective. In this setting, a neural network $u_\theta(\mathbf{x},t)$ approximates the unknown physical field $u(\mathbf{x},t)$, where $\theta$ denotes the learnable parameters, $\mathbf{x}\in\Omega$ denotes the spatial coordinate in the domain $\Omega$, and $t\in[0,T]$ denotes time.
+
+
+Consider a general time-dependent nonlinear PDE of the form
+\begin{equation}
+\partial_t u(\mathbf{x},t) + \mathcal{N}[u](\mathbf{x},t) = 0,
+\quad \mathbf{x} \in \Omega,\quad t \in [0,T],
+\label{eq:generic_pde}
+\end{equation}
+where $\mathcal{N}[\cdot]$ denotes a possibly nonlinear spatial differential operator. PINNs~\cite{raissi2019pinn} define a physics residual by substituting the neural approximation $u_\theta$ into the governing equation:
+\begin{equation}
+r_\theta(\mathbf{x},t)
+:=
+\partial_t u_\theta(\mathbf{x},t)
++
+\mathcal{N}[u_\theta](\mathbf{x},t).
+\label{eq:pde_residual}
+\end{equation}
+The governing equation is satisfied at a point $(\mathbf{x},t)$ when $r_\theta(\mathbf{x},t)=0$. Therefore, the PDE residual is penalized over a set of collocation points $\{(\mathbf{x}_j,t_j)\}_{j=1}^{N_r}$:
+\begin{equation}
+\mathcal{L}_{\mathrm{PDE}}
+=
+\frac{1}{N_r}
+\sum_{j=1}^{N_r}
+\left\|r_\theta(\mathbf{x}_j,t_j)\right\|_2^2.
+\label{eq:pde_loss}
+\end{equation}
+
+The full training objective then combines the equation loss with supervision on the solution itself:
+\begin{equation}
+\mathcal{L}
+=
+\underbrace{\lambda_{r}\mathcal{L}_{\mathrm{PDE}}}_{\text{equation / physics loss}}
++\;
+\underbrace{\left(
+\lambda_{b}\mathcal{L}_{\mathrm{BC}}
++
+\lambda_{i}\mathcal{L}_{\mathrm{IC}}
++
+\lambda_{d}\mathcal{L}_{\mathrm{data}}
+\right)}_{\text{data / constraint loss}}.
+\label{eq:pinn_loss_compact}
+\end{equation}
+Here, $\mathcal{L}_{\mathrm{BC}}$, $\mathcal{L}_{\mathrm{IC}}$, and $\mathcal{L}_{\mathrm{data}}$ measure violations of boundary conditions, initial conditions, and observed data, respectively, while $\lambda_r,\lambda_b,\lambda_i,\lambda_d$ are scalar balancing weights.
+
+
+\subsection{Alternative formulations}
+
+The original PINN formulation enforces the \emph{strong-form} PDE residual $r_\theta(\mathbf{x},t)$ toward zero \emph{pointwise}. Many subsequent variants reformulate this objective to improve stability, reduce derivative requirements, or better match different classes of physical systems. For notation, we write $\mathbf{z}=(\mathbf{x},t)$ and let $\mathcal{D}=\Omega\times[0,T]$ denote the space-time domain. These alternatives can be roughly grouped into the following categories.
+
+
+\subsubsection{Variational/energy formulation}
+
+
+Some PDEs admit an energy or variational principle, where the true solution is 
+characterized as the minimizer of an integral functional, such as the Dirichlet 
+energy. For example, the Deep Ritz method~\cite{yu2018deepritz} considers PDEs 
+with variational formulations, in which the solution satisfies
+\begin{equation}
+u^{*}
+=
+\arg\min_{u\in\mathcal{V}}
+\mathcal{E}(u),
+\label{eq:deep_ritz_variational}
+\end{equation}
+where $\mathcal{V}$ denotes the admissible function space and $\mathcal{E}(\cdot)$ 
+denotes the corresponding energy functional. Instead of optimizing directly over 
+the infinite-dimensional space $\mathcal{V}$, Deep Ritz parameterizes the solution 
+with a neural network $u_\theta$ and solves
+\begin{equation}
+\theta^{*}
+=
+\arg\min_{\theta}
+\mathcal{E}(u_\theta).
+\label{eq:deep_ritz_nn}
+\end{equation}
+
+Similar to PINNs, energy-based methods still use a neural network to approximate 
+the solution field itself. However, the training signal comes from minimizing a 
+global energy functional $\mathcal{E}$ rather than penalizing pointwise PDE 
+residuals. In practice, the integral in $\mathcal{E}$ can be estimated by Monte 
+Carlo sampling over the physical domain, making the objective compatible with 
+standard stochastic gradient optimization. Compared with residual-based PINNs, 
+variational formulations often require lower-order derivatives of the network 
+output and can more naturally preserve physical structures encoded by the energy, 
+such as force balance, symmetry, or conservation-related constraints, without 
+introducing separate penalty losses.
+
+Recent work has extended this idea to neural operators. For example, Variational 
+PINO (VINO)~\cite{eshaghi2025variational} trains a neural operator by minimizing 
+the PDE energy, achieving strong performance without labeled solution data.
+
+
+
+
+
+
+
+\subsubsection{Weak formulations}
+
+Another route is to enforce the PDE in an \emph{integrated} or \emph{weak} sense 
+rather than pointwise. Instead of requiring the strong-form residual to vanish at 
+individual collocation points, weak-form methods require the residual to vanish 
+when tested against a set of test functions. For a set of test functions 
+$\{v_k\}_{k=1}^{K}$, this can be written as
+\begin{equation}
+\mathcal{R}_{\theta}(v_k)
+:=
+\int_{\mathcal{D}}
+r_\theta(\mathbf{z})\,v_k(\mathbf{z})\,d\mathbf{z}
+\approx 0,
+\qquad k=1,\dots,K,
+\label{eq:weak_residual}
+\end{equation}
+where $\mathcal{R}_{\theta}(v_k)$ denotes the weak residual associated with the 
+test function $v_k$. In practice, weak formulations often integrate the PDE by 
+parts, which transfers derivatives from the neural solution $u_\theta$ to the 
+test functions. This reduces the derivative order required from the neural network 
+and can improve stability for irregular or non-smooth solutions.
+
+Variational Physics-Informed Neural Networks (VPINNs)~\cite{kharazmi2019vpinn} 
+optimize a loss over such weak residuals:
+\begin{equation}
+\mathcal{L}_{\mathrm{weak}}
+=
+\frac{1}{K}
+\sum_{k=1}^{K}
+\left|
+\mathcal{R}_{\theta}(v_k)
+\right|^2.
+\label{eq:vpinn_loss}
+\end{equation}
+Relative to standard PINNs, the key difference is that the PDE is enforced in an 
+averaged integral sense rather than pointwise.
+
+hp-VPINNs~\cite{kharazmi2021hpvpinn} retain the same weak-form principle, but 
+apply it locally over a partition of the domain 
+$\mathcal{D}=\bigcup_{e=1}^{N_{\mathrm{sd}}}\mathcal{D}_e$. The corresponding 
+local weak-form loss can be written as
+\begin{equation}
+\mathcal{L}_{\mathrm{hp}}
+=
+\frac{1}{N_{\mathrm{sd}}K}
+\sum_{e=1}^{N_{\mathrm{sd}}}
+\sum_{k=1}^{K}
+\left|
+\mathcal{R}_{\theta}^{(e)}(v_k^{(e)})
+\right|^2,
+\label{eq:hpvpinn_loss}
+\end{equation}
+where
+\begin{equation}
+\mathcal{R}_{\theta}^{(e)}(v_k^{(e)})
+:=
+\int_{\mathcal{D}_e}
+r_\theta(\mathbf{z})\,v_k^{(e)}(\mathbf{z})\,d\mathbf{z}.
+\label{eq:local_weak_residual}
+\end{equation}
+Here, $\mathcal{D}_e$ denotes the $e$-th subdomain, $N_{\mathrm{sd}}$ is the 
+number of subdomains, and $v_k^{(e)}$ is a local test function on $\mathcal{D}_e$. 
+This local formulation makes refinement more flexible: $h$-refinement subdivides 
+the domain more finely, while $p$-refinement increases the polynomial order of the 
+local test space. As a result, hp-VPINNs can better resolve multi-scale or 
+spatially heterogeneous solutions.
+
+A related line of work studies the choice of test space and residual norm. For 
+example, Robust VPINNs~\cite{rojas2024robust} address the sensitivity of classical 
+VPINNs to the test basis by minimizing residuals in a dual norm, leading to 
+improved stability.
+
+
+
+
+\subsubsection{Adversarial/Minimax formulations}
+
+Weak formulations can also be cast as saddle-point problems. A representative 
+example is the Weak Adversarial Network (WAN)~\cite{zang2020weak}. Instead of 
+choosing a fixed set of test functions, WAN parameterizes both the solution and 
+the test function with neural networks: $u_\theta$ for the solution and 
+$\varphi_\eta$ for the test function. The method then solves a minimax problem of 
+the form
+\begin{equation}
+\min_{\theta}
+\max_{\eta}
+\;
+\mathcal{J}(\theta,\eta),
+\label{eq:wan_minimax}
+\end{equation}
+where $\mathcal{J}(\theta,\eta)$ measures the weak residual induced by the test 
+network $\varphi_\eta$. Intuitively, the solution network $u_\theta$ tries to 
+minimize the residual, while the test network $\varphi_\eta$ acts as an adversary 
+that searches for regions or directions where the current solution still violates 
+the PDE. Therefore, rather than enforcing the residual against a fixed test basis, 
+WAN adaptively learns test functions that expose the remaining error.
+
+This adversarial weak-form perspective is especially useful when hand-designed 
+test functions are insufficient or when the PDE is high-dimensional or non-smooth.
+
+
+
+
+% \subsubsection{Conservative/integral constraints}
+
+% Some methods focus on enforcing physical conservation laws explicitly. Instead of 
+% only minimizing a local PDE residual, these methods impose integral constraints 
+% that encode global or local conservation. For example, MUSA-PINN~\cite{zhang2026musa} 
+% enforces mass or momentum conservation over control volumes by using flux-balance 
+% integrals derived from the divergence theorem. Such constraints can be written 
+% abstractly as
+% \begin{equation}
+% \mathcal{C}_m(u_\theta)
+% =
+% c_m,
+% \qquad m=1,\dots,M,
+% \label{eq:conservation_constraints}
+% \end{equation}
+% where $\mathcal{C}_m(\cdot)$ denotes a conserved physical quantity, such as total 
+% mass or energy, and $c_m$ is its prescribed value.
+
+% Other approaches impose conservation through projection. For example, 
+% PINN-Proj~\cite{baez2024guaranteeing} projects the neural output onto a constraint 
+% manifold that satisfies the desired conservation laws:
+% \begin{equation}
+% \tilde{u}_\theta
+% =
+% \Pi_{\mathcal{C}}(u_\theta),
+% \label{eq:pinn_projection}
+% \end{equation}
+% where $\Pi_{\mathcal{C}}$ denotes projection onto the physically admissible set 
+% $\mathcal{C}$. By construction, the projected solution $\tilde{u}_\theta$ satisfies 
+% the chosen conservation constraints, reducing the drift in conserved quantities 
+% that can occur when conservation is enforced only through soft penalties.
+
+% These integral and projection-based constraints complement weak-form methods: weak 
+% forms enforce the PDE in an averaged sense, while conservative formulations ensure 
+% that selected physical invariants are respected more directly.
+
+
+
+% Some methods focus on enforcing integral conservation laws explicitly. For instance, MUSA-PINN~\cite{zhang2026musa} imposes mass/momentum conservation over control volumes by enforcing flux-balance integrals via the divergence theorem. Other approaches project the solution onto physically conserved manifolds. Baez et al. propose PINN-Proj~\cite{baez2024guaranteeing}, which project the solution onto physically conserved manifolds, and guarantees exact conservation of chosen integrals (like total mass or energy) by projecting the neural output onto a subspace satisfying the conservation law. This ``hard constraint'' approach eliminates drift in conserved quantities, whereas standard PINNs only enforce them in expectation. Such integral constraints complement weak-form ideas by ensuring global physical laws are honored exactly.
+
+
+\subsubsection{Summary}
+Together, these developments broaden the ``formulation'' stage of PINN research. They demonstrate that one can teach a network to respect a PDE either by driving a pointwise residual to zero (classical PINN~\cite{raissi2019pinn}), by minimizing an energy integral (Deep Ritz~\cite{yu2018deepritz}, VINO~\cite{eshaghi2025variational}), by enforcing weighted integral constraints (VPINN~\cite{kharazmi2019vpinn}, hp-VPINN~\cite{kharazmi2021hpvpinn}, WF-PINN~\cite{wang2025wf}, etc.), or even by solving a minimax problem (WAN~\cite{zang2020weak}). Each alternative has its own advantages: variational forms lower the required smoothness, weak forms improve stability on irregular solutions, and projection or flux methods enforce conservation exactly. The literature continues to evolve these ideas, offering a rich toolkit for physics-informed learning beyond the original PINN objective.
+
+
+
+\subsection{Diagnosis: why naive composite losses fail in PINN}
+
+
+Subsequent work showed that the challenge of data/loss-embedded physics lies not 
+only in formulating the objective, but also in optimizing it reliably. For naive 
+composite PINN losses, failures can arise from several intertwined sources: 
+imbalanced gradients across loss terms, uneven convergence dynamics, ill-conditioned 
+residual optimization, and representation bias in the neural network itself.
+
+
+
+\subsubsection{Loss imbalance and uneven convergence}
+
+For the composite PINN loss in Eq.~\ref{eq:pinn_loss_compact}, Wang \etal{}~\cite{wang2021gradientpathologies} showed that the PDE, boundary, initial, and data terms can induce highly imbalanced gradients, so that some objectives 
+dominate training while others make little progress. To diagnose this imbalance, they compared the gradient magnitudes contributed by different loss terms and proposed adaptively balancing each non-PDE term $\mathcal{L}_i \in \{\mathcal{L}_{\mathrm{BC}}, \mathcal{L}_{\mathrm{IC}}, \mathcal{L}_{\mathrm{data}}\}$ against the PDE term:
+\begin{equation}
+\hat{\lambda}_i
+=
+\frac{
+\max \bigl|\nabla_{\theta}\mathcal{L}_{\mathrm{PDE}}\bigr|
+}{
+\operatorname{mean}\bigl|\nabla_{\theta}\mathcal{L}_{i}\bigr|
+},
+\qquad
+\lambda_i \leftarrow (1-\alpha)\lambda_i+\alpha \hat{\lambda}_i,
+\label{eq:grad_pathology}
+\end{equation}
+where $\alpha\in(0,1)$ is a smoothing factor. This observation reveals that PINN training can fail even when each individual loss term is well defined, because the composite objective may provide poorly balanced optimization signals.
+
+A complementary perspective comes from the neural tangent kernel (NTK) analysis. Wang \etal{}~\cite{wang2022and} showed that different components of the PINN objective can converge at substantially different rates during training. This suggests that the imbalance is not only a matter of manually chosen scalar weights or instantaneous gradient magnitudes, but is also tied to the spectrum of the training dynamics induced by the PDE operator and the neural parameterization. In other words, gradient imbalance is a local symptom of a broader convergence-rate mismatch among the physics and data constraints.
+
+
+
+\subsubsection{Ill-conditioned residual optimization}
+Krishnapriyan \etal{}~\cite{krishnapriyan2021failuremodes} further showed that 
+failures on harder PDEs often arise not from limited expressivity, but from 
+optimization difficulty and the brittleness of strong-form residual minimization. 
+Their analysis can be viewed through objectives of the form
+\begin{equation}
+\min_{\theta}\;
+\mathcal{L}_{u}
++
+\lambda_{r}\mathcal{L}_{\mathrm{PDE}},
+\label{eq:failure_modes}
+\end{equation}
+where $\mathcal{L}_{u}$ is shorthand for the supervision terms on the solution, 
+including boundary, initial, and observed data terms. Their key observation is 
+that simply increasing the PDE weight $\lambda_r$ does not necessarily improve 
+training: while a larger $\lambda_r$ enforces physics more strongly, it can also 
+make the optimization problem more ill-conditioned. Related loss-landscape 
+analyses similarly show that differential operators in the residual term can 
+produce poorly conditioned objectives, making PINN training sensitive to optimizer 
+choice and hyperparameter settings~\cite{rathore2024challenges}.
+
+\subsubsection{Representation and frequency bias}
+Another diagnosis concerns the representation bias of the neural network itself. 
+Standard fully connected networks tend to learn smooth, low-frequency components 
+more easily than high-frequency or multi-scale structures. Wang \etal{}
+~\cite{wang2021eigenvector} connected this behavior to the eigenspectrum of the 
+limiting NTK and showed that conventional PINNs can struggle when the target 
+solution contains sharp spatial or temporal variations. Thus, even when the PDE 
+residual is correctly specified, the neural parameterization and its optimization 
+dynamics may bias training away from the physically relevant solution.
+
+\subsubsection{Takeaway}
+Together, these diagnoses show that naive composite PINN losses can fail for 
+several intertwined reasons: different loss terms may generate imbalanced or 
+conflicting gradients, the residual objective may be ill-conditioned, and the 
+neural parameterization may favor smooth low-frequency solutions over the 
+multi-scale structures required by the PDE. These observations motivate the 
+remedy strategies discussed next, which aim to rebalance, resample, schedule, or 
+better optimize the physics-informed objective.
+
+
+
+
+
+
+
+
+
+
+
+\subsection{Diagnosis: why naive composite losses fail}
+Subsequent work showed that the challenge of data/loss-embedded physics lies not only in formulating the objective, but also in optimizing it reliably. For the composite PINN loss in Eq.~\ref{eq:pinn_loss_compact}, Wang \etal{}~\cite{wang2021gradientpathologies} showed that the PDE, boundary, initial, and data terms can induce highly imbalanced gradients, so that some objectives dominate training while others make little progress. To diagnose and mitigate this issue, they proposed adaptively balancing each non-PDE term $\mathcal{L}_i \in \{\mathcal{L}_{\mathrm{BC}}, \mathcal{L}_{\mathrm{IC}}, \mathcal{L}_{\mathrm{data}}\}$ against the PDE term:
+\begin{equation}
+\hat{\lambda}_i
+=
+\frac{
+\max \bigl|\nabla_{\theta}\mathcal{L}_{\mathrm{PDE}}\bigr|
+}{
+\bigl|\nabla_{\theta}\mathcal{L}_{i}\bigr|
+},
+\qquad
+\lambda_i \leftarrow (1-\alpha)\lambda_i+\alpha \hat{\lambda}_i,
+\label{eq:grad_pathology}
+\end{equation}
+where $\alpha\in(0,1)$ is a smoothing factor. Krishnapriyan \etal{}~\cite{krishnapriyan2021failuremodes} further showed that failures on harder PDEs often arise not from limited expressivity, but from optimization difficulty and the brittleness of strong-form residual minimization itself. Their analysis can be viewed through objectives of the form
+\begin{equation}
+\min_{\theta}\;
+\mathcal{L}_{u}
++
+\lambda_{r}\mathcal{L}_{\mathrm{PDE}},
+\label{eq:failure_modes}
+\end{equation}
+where $\mathcal{L}_{u}$ is shorthand for the supervision terms on the solution, including boundary, initial, and observed data terms. Their key observation is that simply increasing the PDE weight $\lambda_r$ does not necessarily improve training: while a larger $\lambda_r$ enforces physics more strongly, it can also make the optimization landscape more ill-conditioned. To make this more manageable, they explored curriculum regularization, which schematically replaces the target PDE loss by a sequence of progressively harder PDE losses,
+\begin{equation}
+\min_{\theta}\;
+\mathcal{L}_{u}
++
+\lambda_{r}\mathcal{L}_{\mathrm{PDE}}^{(s)},
+\qquad s=1,\dots,S,
+\label{eq:curriculum_pde}
+\end{equation}
+where $s$ indexes the curriculum stage and $S$ is the total number of stages. Intuitively, the curriculum does not change the overall formulation, but makes the PDE part of the objective easier to optimize in early stages.
+
+
+
+
+
+
+\subsection{Remedies}
+
+The above failure modes have motivated a broad family of remedies, summarized in Table~\ref{tab:pinn_remedies}. To connect these methods with the failure modes discussed above, we organize them according to which part of the PINN pipeline they modify: loss balancing and optimization, residual sampling and curriculum design, neural representation and architecture, constraint enforcement, and domain decomposition. This taxonomy also highlights a useful distinction: some methods directly reshape the composite optimization objective, while others improve the sampling strategy, the neural trial space, the enforcement of physics constraints, or the scalability of the solver.
+
+
+\subsubsection{Loss balancing and optimization}
+
+\textbf{Loss balancing.} A first class of remedies addresses the composite PINN objective itself. Because the PDE residual, boundary conditions, initial conditions, and data terms can have very different magnitudes and gradient scales, fixed loss weights may cause some objectives to dominate training while others make little progress. Gradient-flow analyses therefore proposed adaptive weighting rules based on the gradient statistics of different loss terms \cite{wang2021gradientpathologies}. Related NTK-based analyses further showed that different components of the PINN loss can converge at different rates, motivating dynamic weights that balance the training dynamics of multiple physics constraints \cite{wang2022and}. More recent loss-balancing methods such as ReLoBRaLo formulate this issue as a multi-objective balancing problem and adjust weights according to relative training progress \cite{bischof2025multi}.
+
+Self-Adaptive PINNs~\cite{mcclenny2023self} address the same general issue from a point-wise residual-weighting perspective. Instead of assigning a fixed penalty to each collocation point, they introduce trainable adaptive weights:
+\begin{equation}
+\mathcal{L}_{\mathrm{PDE}}
+=
+\sum_j w_j\, r_\theta(\mathbf{x}_j,t_j)^2,
+\label{eq:adaptive_weight_compact}
+\end{equation}
+where $w_j$ is the adaptive importance weight for the residual at collocation point $(\mathbf{x}_j,t_j)$. The network parameters are optimized to minimize the loss, while the weights are encouraged to increase on hard points with large residuals. As a result, the method automatically allocates more optimization effort to regions where the PDE is most strongly violated.
+
+\noindent\textbf{Optimization.} Beyond weighting, optimizer design is also central to PINN training. Recent loss-landscape studies show that PINN objectives can be highly ill-conditioned, partly because differential operators amplify certain directions in parameter space~\cite{rathore2024challenges}. This explains why second-order or quasi-second-order optimizers such as L-BFGS~\cite{liu1989limited}, NysNewton CG~\cite{rathore2024challenges}, and SOAP-style preconditioning \cite{wanggradient,vyas2025soap} can substantially improve training stability. Schematically, such methods precondition the gradient update as
+\begin{equation}
+\theta_{t+1}
+\approx
+\theta_t - \eta H^{-1} g_t,
+\label{eq:preconditioned_update}
+\end{equation}
+where $\theta_t$ denotes the model parameters, $g_t$ is the total gradient, $\eta$ is the learning rate, and $H$ denotes a curvature matrix or its approximation. Intuitively, curvature-aware preconditioning rescales poorly conditioned directions and can implicitly reduce conflicts among the gradients induced by different loss terms. These methods correspond to the first block of Table~\ref{tab:pinn_remedies}, which focuses on improving how the composite PINN objective is weighted and optimized.
+
+
+
+\subsubsection{Residual sampling and causal curricula}
+
+\noindent\textbf{Residual sampling.} A second class of remedies changes the distribution and order of physics supervision. In standard PINNs, collocation points are often sampled uniformly from the spatio-temporal domain. However, uniform sampling can waste many residual points in regions that are already well learned, while undersampling difficult regions with large PDE violations. Residual-based adaptive refinement methods, including RAR, RAD, and RAR-D, therefore update the sampling distribution according to the current residual \cite{wu2023comprehensive}:
+\begin{equation}
+p(\mathbf{x},t)
+\propto
+\phi\!\left(\left|r_\theta(\mathbf{x},t)\right|\right),
+\label{eq:rad_sampling}
+\end{equation}
+where $p(\mathbf{x},t)$ denotes the sampling density and $\phi(\cdot)$ is a monotone function of the residual magnitude. This shifts collocation points toward regions where the current PINN violates the governing equation most strongly. Region-optimized PINNs further refine this idea by optimizing the spatial allocation of residual points more explicitly \cite{wu2024ropinn}. In this sense, adaptive sampling improves where physics is enforced, rather than changing the PDE loss itself.
+
+\noindent\textbf{Causality-aware sampling.} For time-dependent problems, another important issue is temporal ordering. If residuals from all time steps are optimized simultaneously, errors from early times can propagate forward and make long-time prediction difficult. Causality-aware training addresses this problem by decomposing the temporal domain into chunks and weighting later chunks according to the accuracy of earlier ones \cite{wang2024respecting}:
+\begin{equation}
+\mathcal{L}_{\mathrm{PDE}}(\theta)
+=
+\frac{1}{N_t}
+\sum_{i=1}^{N_t}
+\omega_i\, \mathcal{L}_{\mathrm{PDE}}^{(i)}(\theta),
+\label{eq:causal_loss}
+\end{equation}
+where $N_t$ is the number of temporal chunks, $\mathcal{L}_{\mathrm{PDE}}^{(i)}$ is the residual loss on the $i$-th time slab, and $\omega_i$ is a causal weight. The weights are designed so that later times receive significant penalty only after earlier-time residuals have been sufficiently reduced. Curriculum-based methods such as CoPINN extend this idea by explicitly organizing training from easier to harder residual constraints \cite{duan2025copinn}. Together, the second block of Table~\ref{tab:pinn_remedies} summarizes methods that improve where, when, and in what order residual supervision is imposed.
+
+
+\subsubsection{Representation and architecture}
+A third class of remedies addresses PINN failures from the perspective of neural representation. Standard coordinate-based MLPs often suffer from spectral bias, which makes them learn low-frequency components more easily than high-frequency or multi-scale structures. This is problematic for PDEs with sharp gradients, oscillatory solutions, boundary layers, or multi-scale dynamics. Fourier feature embeddings directly target this limitation by reshaping the coordinate representation and mitigating eigenvector bias in multi-scale PDEs \cite{wang2021eigenvector}. Similarly, sinusoidal activations provide a neural representation better suited for high-frequency implicit functions \cite{sitzmann2020implicit}, while locally adaptive activation functions introduce learnable activation slopes to accelerate convergence \cite{jagtap2020locally}. These methods do not directly modify the physics loss, but they make the neural trial space better matched to the target solution.
+
+More recent architectural remedies redesign the PINN backbone itself. SPINN uses separable network structures to improve efficiency, particularly through more efficient forward-mode automatic differentiation \cite{cho2023separable}. PINNsformer instead introduces a Transformer-based architecture to model sequential dependencies in physics-informed learning \cite{zhao2024pinnsformer}. These methods correspond to the representation and architecture block of Table~\ref{tab:pinn_remedies}: they are most useful when the difficulty comes not only from loss imbalance or sampling, but also from a mismatch between a simple MLP and the structure of the PDE solution.
+
+
+
+
+\subsubsection{Constraint enforcement}
+A fourth class of remedies modifies how boundary, initial, and physical constraints are imposed. In standard PINNs, boundary and initial conditions are usually enforced as soft penalty terms in the loss. This introduces additional loss-balancing difficulty: if the penalty is too small, the constraints may be violated; if it is too large, the PDE residual may be under-optimized. Classical hard-constrained neural trial functions address this issue by constructing solutions that satisfy prescribed constraints by design \cite{lagaris1998artificial}. A typical form is
+\begin{equation}
+u_\theta(\mathbf{x},t)
+=
+g(\mathbf{x},t)
++
+d(\mathbf{x},t) N_\theta(\mathbf{x},t),
+\label{eq:hard_constraint_ansatz}
+\end{equation}
+where $g(\mathbf{x},t)$ satisfies the prescribed constraint, $d(\mathbf{x},t)$ vanishes on the constrained boundary, and $N_\theta$ is the trainable neural network. Since the constraint is built into the solution form, the optimizer no longer needs to enforce it only through a soft penalty weight. Modern PINN libraries and formulations further implement such hard constraints
+using approximate distance functions and geometry-aware output transformations \cite{lu2021deepxde}. Recent work also studies soft and hard boundary constraints for specific PDE families such as advection--diffusion equations \cite{li2024physical}.
+
+Variational and weak-form PINNs provide another way to improve constraint and residual enforcement. Instead of directly minimizing the point-wise strong-form PDE residual, these methods enforce the governing equation against test functions in an integral form. hp-VPINNs combine this variational formulation with hp-refinement and domain decomposition, improving the connection between PINNs and classical finite-element or Galerkin methods \cite{kharazmi2021hpvpinn}. Thus, the constraint-enforcement block of Table~\ref{tab:pinn_remedies} captures two related strategies: satisfying constraints by construction and replacing strong-form residuals with weak-form or variational objectives.
+
+
+
+\subsubsection{Domain decomposition and scalability}
+Finally, a fifth class of remedies improves PINNs by localizing the learning problem. Instead of fitting a single global network over the entire spatio-temporal domain, domain-decomposition methods divide the domain into subregions and train local networks coupled through interface, conservation, or partition-of-unity constraints. Conservative PINNs impose interface flux continuity for conservation laws \cite{jagtap2020conservative}, while XPINNs generalize this idea to flexible space-time domain decomposition for nonlinear PDEs \cite{jagtap2020extended}. FBPINNs further introduce overlapping subdomains and partition-of-unity weighting to make the decomposition more scalable and localized \cite{moseley2023finite}.
+
+Recent extensions improve the scalability and adaptivity of this decomposition view. Multilevel FBPINNs introduce hierarchical decompositions to improve global communication across subdomains \cite{dolean2024multilevel}, while AB-PINNs use residual-driven adaptive bases to dynamically allocate decomposition capacity \cite{botvinick2025ab}. These methods correspond to the final block of Table~\ref{tab:pinn_remedies}. They are especially useful for heterogeneous, multi-scale, or long-time problems where a single global PINN is difficult to optimize.
+
+
+\subsubsection{Summary}
+Overall, the remedies in Table~\ref{tab:pinn_remedies} show that PINN performance is determined not only by whether the correct physical equations are included in the objective, but also by whether the resulting learning problem is numerically trainable. Loss balancing and second-order optimization improve how competing objectives are minimized; adaptive sampling and causal curricula improve where and when residuals are enforced; representation and architectural methods improve what functions the network can express; hard constraints and weak forms improve how physics is encoded; and domain decomposition improves scalability to complex physical systems.
+
+
+
+\begin{table*}[t]
+\centering
+\small
+\setlength{\tabcolsep}{6pt}
+\renewcommand{\arraystretch}{1.08}
+\caption{Representative remedies for PINN failure modes, organized by whether
+they modify the loss and optimizer, residual supervision, neural representation,
+constraint enforcement, or domain decomposition.}
+\label{tab:pinn_remedies}
+\resizebox{\textwidth}{!}{%
+\begin{tabular}{p{0.11\textwidth} p{0.32\textwidth} p{0.15\textwidth} p{0.72\textwidth}}
+\toprule
+\textbf{Type} & \textbf{Method} & \textbf{Venue / Year} & \textbf{Keyword-style Contribution} \\
+\midrule
+
+\multirow{6}{=}{\centering Loss balancing and optimization}
+& Gradient-flow weighting~\cite{wang2021gradientpathologies}
+& SISC 2021
+& Adaptive loss weighting based on gradient statistics across PDE, boundary, and data terms. \\
+
+& NTK-based weighting~\cite{wang2022and}
+& JCP 2022
+& Balances different physics constraints through neural tangent kernel training dynamics. \\
+
+& SA-PINNs~\cite{mcclenny2023self}
+& JCP 2023
+& Learns adaptive residual weights to emphasize difficult collocation points. \\
+
+& Loss-landscape / NysNewton-CG~\cite{rathore2024challenges}
+& ICML 2024
+& Studies PINN ill-conditioning and improves training with second-order optimization. \\
+
+& ReLoBRaLo~\cite{bischof2025multi}
+& CMAME 2025
+& Relative loss balancing with random lookback for multi-objective PINN training. \\
+
+& SOAP / gradient alignment~\cite{wanggradient}
+& NeurIPS 2025
+& Uses quasi-second-order preconditioning to improve gradient alignment in composite PINN objectives. \\
+
+\midrule
+
+\multirow{4}{=}{\centering Residual sampling and curriculum}
+& RAR / RAD / RAR-D~\cite{wu2023comprehensive}
+& CMAME 2023
+& Residual-based adaptive refinement and distribution-based collocation sampling. \\
+
+& RoPINN~\cite{wu2024ropinn}
+& NeurIPS 2024
+& Region-optimized residual sampling for more efficient collocation point selection. \\
+
+& Causal PINN training~\cite{wang2024respecting}
+& CMAME 2024
+& Causality-aware temporal segmentation and residual reweighting for time-dependent PDEs. \\
+
+& CoPINN~\cite{duan2025copinn}
+& ICML 2025
+& Cognitive easy-to-hard curriculum training for progressively enforcing difficult residuals. \\
+
+\midrule
+
+\multirow{5}{=}{\centering Representation and architecture}
+& Fourier features / eigenvector bias~\cite{wang2021eigenvector}
+& CMAME 2021
+& Multi-scale coordinate embeddings to mitigate spectral and eigenvector bias. \\
+
+& Adaptive activation functions~\cite{jagtap2020locally}
+& Proc. R. Soc. A 2020
+& Learnable activation slopes and slope-recovery terms for faster convergence. \\
+
+& SIREN~\cite{sitzmann2020implicit}
+& NeurIPS 2020
+& Sinusoidal activations for representing high-frequency implicit functions. \\
+
+& SPINN~\cite{cho2023separable}
+& NeurIPS 2023
+& Separable network structure for efficient forward-mode automatic differentiation. \\
+
+& PINNsformer~\cite{zhao2024pinnsformer}
+& ICLR 2024
+& Transformer-based architecture for modeling sequential dependencies in PINNs. \\
+
+\midrule
+
+\multirow{4}{=}{\centering Constraint enforcement}
+% & Hard boundary ansatz~\cite{lagaris1998artificial}
+% & IEEE TNN 1998
+% & Constructs neural trial solutions that satisfy boundary conditions by design. \\
+
+& Approximate distance functions~\cite{lu2021deepxde}
+& SIAM Review 2021
+& Implements hard constraints using distance functions and geometry-aware output transformations. \\
+
+& hp-VPINN~\cite{kharazmi2021hpvpinn}
+& CMAME 2021
+& Variational weak-form PINNs with hp-refinement and domain decomposition. \\
+
+& Hard initial/boundary constraints~\cite{li2024physical}
+& CMA 2024
+& Enforces prescribed initial and boundary conditions through constrained solution forms. \\
+
+\midrule
+
+\multirow{5}{=}{\centering Domain decomposition and scalability}
+& cPINN~\cite{jagtap2020conservative}
+& CMAME 2020
+& Conservative domain decomposition with interface flux continuity for conservation laws. \\
+
+& XPINN~\cite{jagtap2020extended}
+& CCP 2020
+& General space-time domain decomposition for heterogeneous PDE problems. \\
+
+& FBPINN~\cite{moseley2023finite}
+& ACOM 2023
+& Overlapping subdomains with partition-of-unity weighting for localized training. \\
+
+& Multilevel FBPINN~\cite{dolean2024multilevel}
+& CMAME 2024
+& Hierarchical domain decomposition for improved global communication and scalability. \\
+
+& AB-PINN~\cite{botvinick2025ab}
+& arXiv 2025
+& Adaptive residual-driven decomposition for dynamically allocating subdomains. \\
+
+\bottomrule
+\end{tabular}%
+}
+\end{table*}
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+% \newpage
+% \newpage
+
+
+
+% \subsubsection{Remedies I: weighting and optimization}
+% A first line of remedies aims to repair conflicts within the composite loss. Self-Adaptive PINNs~\cite{mcclenny2023sapinn} introduce trainable weights over collocation points and optimize them jointly with the network parameters:
+% \begin{equation}
+% \mathcal{L}_{\mathrm{PDE}}
+% =
+% \sum_j w_j\, r_\theta(\mathbf{x}_j,t_j)^2,
+% \label{eq:adaptive_weight_compact}
+% \end{equation}
+% where $w_j$ is a trainable adaptive importance weight for the residual at collocation point $(\mathbf{x}_j,t_j)$. Unlike fixed reweighting schemes, SA-PINNs train the network parameters to minimize the loss while driving the weights to increase on hard points, effectively seeking a saddle point. As a result, the method is \emph{self-adaptive}: regions with persistently large residuals automatically receive larger penalties and attract more optimization effort.
+
+% More recently, Gradient Alignment in PINNs~\cite{wang2025gradientalignment} argued that optimizer choice is itself central to PINN training. They showed that first-order methods often struggle with the composite objective because gradients from the PDE and data/constraint terms can point in conflicting directions. Their main insight is that (quasi) second-order optimizers are better suited to this setting because curvature-based preconditioning updates
+% \begin{equation}
+% w_{t+1}\approx w_t-\eta H^{-1}g_t,
+% \label{eq:preconditioned_update}
+% \end{equation}
+% where $w_t$ denotes the model parameters at iteration $t$, $g_t$ is the total gradient, $\eta$ is the learning rate, and $H$ denotes the Hessian. Intuitively, such updates can implicitly align competing gradients through curvature information, making the composite PINN objective easier to optimize. In particular, they identified SOAP as a practical quasi-Newton method that consistently outperforms standard first-order training on challenging PINN benchmarks.
+
+% \subsubsection{Remedies II: sampling and causality}
+% A second remedy line reorganizes \emph{physics supervision} itself, namely where and when the PDE residual is enforced. Wu \etal{}~\cite{wu2023adaptivesampling} showed that the residual points used in $\mathcal{L}_{\mathrm{PDE}}$ are not merely implementation details, but a central part of the learning problem. Their key idea is to replace uniform residual sampling by residual-informed sampling, schematically
+% \begin{equation}
+% p(\mathbf{x},t)\propto \phi\!\left(\left|r_\theta(\mathbf{x},t)\right|\right),
+% \label{eq:rad_sampling}
+% \end{equation}
+% where $p(\mathbf{x},t)$ denotes the sampling density of residual points and $\phi(\cdot)$ is a nonlinear function of the PDE residual. Intuitively, this shifts collocation points toward regions where the current PINN violates the equation most strongly. In this sense, adaptive sampling improves \emph{where} physics is enforced, rather than changing the loss itself.
+
+% For time-dependent problems, causality-aware training~\cite{wang2024causality} argues that residual losses should also respect temporal order. Instead of penalizing all times uniformly, they reformulate the residual objective as a weighted sum over temporal chunks,
+% \begin{equation}
+% \mathcal{L}_{\mathrm{PDE}}(\theta)
+% =
+% \frac{1}{N_t}\sum_{i=1}^{N_t} \omega_i\, \mathcal{L}_{\mathrm{PDE}}^{(i)}(\theta),
+% \label{eq:causal_loss}
+% \end{equation}
+% where $N_t$ is the number of temporal chunks, $\mathcal{L}_{\mathrm{PDE}}^{(i)}(\theta)$ is the residual loss associated with the $i$-th time slab, and $\omega_i$ is a temporal weight. The weights are designed so that later times receive large weight only after earlier-time residuals have been sufficiently reduced. Thus, this method improves \emph{when} physics is enforced: it keeps the same residual objective, but schedules it in a way that respects causal temporal evolution.
+
+% \noindent\textbf{Summary.}
+% Taken together, these works define a clear progression from \emph{formulation}, to \emph{diagnosis}, to \emph{remedy} in data and loss-embedded physics. The field began by asking how physical laws should enter the loss, then showed that naive composite objectives can be numerically brittle, and finally developed methods that improve how these objectives are balanced, sampled, and optimized. This progression highlights that successful physics-informed learning depends not only on writing the right equations into the loss, but also on making the resulting objective trainable in practice.
+
+
+
+
+
+%-------------------------------------------------------------------------
+
+%\bibliographystyle{eg-alpha}
+\bibliographystyle{eg-alpha-doi}
+
+\bibliography{egbibsample}
+
+%-------------------------------------------------------------------------
+\newpage
+
+
+\end{document}
+
diff --git a/part-1.typ b/part-1.typ
new file mode 100644
index 0000000..8ffb418
--- /dev/null
+++ b/part-1.typ
@@ -0,0 +1,646 @@
+#set math.equation(numbering: "1")
+
+= Writing Plan
+<writing-plan>
+\1. Introduction
+
+\2. Theoretical Foundations of Physics in Visual Computing
+
+\3. Physics-Informed and Physics-Embedded Neural Methods You can pick
+one of the following topics
+
+- Data and Loss-embedded physics: PDE residual losses, initial-value and
+  boundary-value constraints, other soft constraints, etc. --Han
+
+- Architecture embedded physics: hard-coded invariances, physically
+  parameterized layers, analytic kernels inside networks, etc.-- David
+
+- Operator embedded physics: differentiable renderers, wave propagation
+  and light transport operators, neural and fourier neural operators,
+  etc. --Andrea
+
+- System embedded physics: hardware in the loop, ONNs, etc. --zhen
+
+- Applications of PINNs --Ana
+
+\4. Failure Modes and Misconceptions (Where do these methods work and do
+not work)
+
+\5. Open Problems and Future Directions
+
+\6. Discussion and Conclusion
+
+= Data and Loss-embedded Physics
+<sec:data_loss_embedded_physics>
+== Background
+<background>
+Physics-Informed Neural Networks (PINNs) combine traditional
+physics-based simulation with deep learning. Classical methods such as
+the Finite Element Method (FEM)~@courant1994variational and Finite
+Volume Method (FVM)~@leveque2002finite@patankar2018numerical solve
+physical equations by first dividing the physical domain into many small
+computational cells, like covering the space with a fine grid. These
+methods are accurate and reliable, but they can be expensive for complex
+geometries, moving boundaries, high-dimensional problems, or repeated
+simulations in design. Pure deep learning can be faster, but if it only
+learns from data, it may violate basic physical laws such as
+conservation of mass, momentum, or energy, making it unreliable when
+data are limited or test cases differ from training examples.
+
+PINNs address this by incorporating physical laws directly into neural
+network training. Instead of only fitting observed data, the model is
+also penalized when its predictions do not satisfy the governing
+differential equations. As a result, PINNs can learn solutions that are
+both data-efficient and physically meaningful. They also avoid the need
+for a predefined computational grid: rather than solving only on a fixed
+grid, PINNs learn a continuous function over space and time and use
+automatic differentiation to check the physical equations at sampled
+points. This makes them useful for irregular shapes, changing domains,
+limited measurements, and inverse problems where hidden physical
+parameters need to be estimated.
+
+== Formulation: embedding physics into the objective
+<formulation-embedding-physics-into-the-objective>
+Data and loss-embedded physics is a foundational paradigm for
+incorporating physical knowledge into neural computation by encoding
+governing equations and physical constraints directly into the training
+objective. In this setting, a neural network
+$u_theta\(upright(bold(x))\,t\)$ approximates the unknown physical field
+$u\(upright(bold(x))\,t\)$, where $theta$ denotes the learnable
+parameters, $upright(bold(x)) in Omega$ denotes the spatial coordinate
+in the domain $Omega$, and $t in\[0\,T\]$ denotes time.
+
+Consider a general time-dependent nonlinear PDE of the form
+$ partial_t u\(upright(bold(x))\,t\)+ cal(N)\[u\]\(upright(bold(x))\,t\)= 0\,quad upright(bold(x)) in Omega\,quad t in\[0\,T\]\, $<eq:generic_pde>
+where $cal(N)\[dot.op\]$ denotes a possibly nonlinear spatial
+differential operator. PINNs~@raissi2019pinn define a physics residual
+by substituting the neural approximation $u_theta$ into the governing
+equation:
+$ r_theta\(upright(bold(x))\,t\):= partial_t u_theta\(upright(bold(x))\,t\)+ cal(N)\[u_theta\]\(upright(bold(x))\,t\). $<eq:pde_residual>
+The governing equation is satisfied at a point $\(upright(bold(x))\,t\)$
+when $r_theta\(upright(bold(x))\,t\)= 0$. Therefore, the PDE residual is
+penalized over a set of collocation points
+${\(upright(bold(x))_j\,t_j\)}_(j = 1)^(N_r)$:
+$ cal(L)_(upright(P D E)) = 1 / N_r sum_(j = 1)^(N_r) ∥r_theta \( upright(bold(x))_j \, t_j \)∥_2^2 . $<eq:pde_loss>
+
+The full training objective then combines the equation loss with
+supervision on the solution itself:
+$ cal(L) = underbrace(lambda_r cal(L)_(upright(P D E)), upright("equation / physics loss")) + #h(0em) underbrace((lambda_b cal(L)_(upright(B C)) + lambda_i cal(L)_(upright(I C)) + lambda_d cal(L)_(upright(d a t a))), upright("data / constraint loss")) . $<eq:pinn_loss_compact>
+Here, $cal(L)_(upright(B C))$, $cal(L)_(upright(I C))$, and
+$cal(L)_(upright(d a t a))$ measure violations of boundary conditions,
+initial conditions, and observed data, respectively, while
+$lambda_r\,lambda_b\,lambda_i\,lambda_d$ are scalar balancing weights.
+
+== Alternative formulations
+<alternative-formulations>
+The original PINN formulation enforces the #emph[strong-form] PDE
+residual $r_theta\(upright(bold(x))\,t\)$ toward zero #emph[pointwise].
+Many subsequent variants reformulate this objective to improve
+stability, reduce derivative requirements, or better match different
+classes of physical systems. For notation, we write
+$upright(bold(z)) =\(upright(bold(x))\,t\)$ and let
+$cal(D) = Omega times\[0\,T\]$ denote the space-time domain. These
+alternatives can be roughly grouped into the following categories.
+
+=== Variational/energy formulation
+<variationalenergy-formulation>
+Some PDEs admit an energy or variational principle, where the true
+solution is characterized as the minimizer of an integral functional,
+such as the Dirichlet energy. For example, the Deep Ritz
+method~@yu2018deepritz considers PDEs with variational formulations, in
+which the solution satisfies
+$ u^(*) = arg min_(u in cal(V)) cal(E)\(u\)\, $<eq:deep_ritz_variational>
+where $cal(V)$ denotes the admissible function space and
+$cal(E)\(dot.op\)$ denotes the corresponding energy functional. Instead
+of optimizing directly over the infinite-dimensional space $cal(V)$,
+Deep Ritz parameterizes the solution with a neural network $u_theta$ and
+solves $ theta^(*) = arg min_theta cal(E)\(u_theta\). $<eq:deep_ritz_nn>
+
+Similar to PINNs, energy-based methods still use a neural network to
+approximate the solution field itself. However, the training signal
+comes from minimizing a global energy functional $cal(E)$ rather than
+penalizing pointwise PDE residuals. In practice, the integral in
+$cal(E)$ can be estimated by Monte Carlo sampling over the physical
+domain, making the objective compatible with standard stochastic
+gradient optimization. Compared with residual-based PINNs, variational
+formulations often require lower-order derivatives of the network output
+and can more naturally preserve physical structures encoded by the
+energy, such as force balance, symmetry, or conservation-related
+constraints, without introducing separate penalty losses.
+
+Recent work has extended this idea to neural operators. For example,
+Variational PINO (VINO)~@eshaghi2025variational trains a neural operator
+by minimizing the PDE energy, achieving strong performance without
+labeled solution data.
+
+=== Weak formulations
+<weak-formulations>
+Another route is to enforce the PDE in an #emph[integrated] or
+#emph[weak] sense rather than pointwise. Instead of requiring the
+strong-form residual to vanish at individual collocation points,
+weak-form methods require the residual to vanish when tested against a
+set of test functions. For a set of test functions ${ v_k }_(k = 1)^K$,
+this can be written as
+$ cal(R)_theta\(v_k\):= integral_(cal(D)) r_theta\(upright(bold(z))\)thin v_k\(upright(bold(z))\)thin d upright(bold(z)) approx 0\,#h(2em) k = 1\,dots.h\,K\, $<eq:weak_residual>
+where $cal(R)_theta\(v_k\)$ denotes the weak residual associated with
+the test function $v_k$. In practice, weak formulations often integrate
+the PDE by parts, which transfers derivatives from the neural solution
+$u_theta$ to the test functions. This reduces the derivative order
+required from the neural network and can improve stability for irregular
+or non-smooth solutions.
+
+Variational Physics-Informed Neural Networks (VPINNs)~@kharazmi2019vpinn
+optimize a loss over such weak residuals:
+$ cal(L)_(upright(w e a k)) = 1 / K sum_(k = 1)^K lr(|cal(R)_theta \( v_k \)|)^2 . $<eq:vpinn_loss>
+Relative to standard PINNs, the key difference is that the PDE is
+enforced in an averaged integral sense rather than pointwise.
+
+hp-VPINNs~@kharazmi2021hpvpinn retain the same weak-form principle, but
+apply it locally over a partition of the domain
+$cal(D) = union.big_(e = 1)^(N_(upright(s d))) cal(D)_e$. The
+corresponding local weak-form loss can be written as
+$ cal(L)_(upright(h p)) = frac(1, N_(upright(s d)) K) sum_(e = 1)^(N_(upright(s d))) sum_(k = 1)^K lr(|cal(R)_theta^(\(e\)) \( v_k^(\(e\)) \)|)^2\, $<eq:hpvpinn_loss>
+where
+$ cal(R)_theta^(\(e\))\(v_k^(\(e\))\):= integral_(cal(D)_e) r_theta\(upright(bold(z))\)thin v_k^(\(e\))\(upright(bold(z))\)thin d upright(bold(z)) . $<eq:local_weak_residual>
+Here, $cal(D)_e$ denotes the $e$-th subdomain, $N_(upright(s d))$ is the
+number of subdomains, and $v_k^(\(e\))$ is a local test function on
+$cal(D)_e$. This local formulation makes refinement more flexible:
+$h$-refinement subdivides the domain more finely, while $p$-refinement
+increases the polynomial order of the local test space. As a result,
+hp-VPINNs can better resolve multi-scale or spatially heterogeneous
+solutions.
+
+A related line of work studies the choice of test space and residual
+norm. For example, Robust VPINNs~@rojas2024robust address the
+sensitivity of classical VPINNs to the test basis by minimizing
+residuals in a dual norm, leading to improved stability.
+
+=== Adversarial/Minimax formulations
+<adversarialminimax-formulations>
+Weak formulations can also be cast as saddle-point problems. A
+representative example is the Weak Adversarial Network
+(WAN)~@zang2020weak. Instead of choosing a fixed set of test functions,
+WAN parameterizes both the solution and the test function with neural
+networks: $u_theta$ for the solution and $phi_eta$ for the test
+function. The method then solves a minimax problem of the form
+$ min_theta max_eta #h(0em) cal(J)\(theta\,eta\)\, $<eq:wan_minimax>
+where $cal(J)\(theta\,eta\)$ measures the weak residual induced by the
+test network $phi_eta$. Intuitively, the solution network $u_theta$
+tries to minimize the residual, while the test network $phi_eta$ acts as
+an adversary that searches for regions or directions where the current
+solution still violates the PDE. Therefore, rather than enforcing the
+residual against a fixed test basis, WAN adaptively learns test
+functions that expose the remaining error.
+
+This adversarial weak-form perspective is especially useful when
+hand-designed test functions are insufficient or when the PDE is
+high-dimensional or non-smooth.
+
+=== Summary
+<summary>
+Together, these developments broaden the "formulation" stage of PINN
+research. They demonstrate that one can teach a network to respect a PDE
+either by driving a pointwise residual to zero (classical
+PINN~@raissi2019pinn), by minimizing an energy integral (Deep
+Ritz~@yu2018deepritz, VINO~@eshaghi2025variational), by enforcing
+weighted integral constraints (VPINN~@kharazmi2019vpinn,
+hp-VPINN~@kharazmi2021hpvpinn, WF-PINN~@wang2025wf, etc.), or even by
+solving a minimax problem (WAN~@zang2020weak). Each alternative has its
+own advantages: variational forms lower the required smoothness, weak
+forms improve stability on irregular solutions, and projection or flux
+methods enforce conservation exactly. The literature continues to evolve
+these ideas, offering a rich toolkit for physics-informed learning
+beyond the original PINN objective.
+
+== Diagnosis: why naive composite losses fail in PINN
+<diagnosis-why-naive-composite-losses-fail-in-pinn>
+Subsequent work showed that the challenge of data/loss-embedded physics
+lies not only in formulating the objective, but also in optimizing it
+reliably. For naive composite PINN losses, failures can arise from
+several intertwined sources: imbalanced gradients across loss terms,
+uneven convergence dynamics, ill-conditioned residual optimization, and
+representation bias in the neural network itself.
+
+=== Loss imbalance and uneven convergence
+<loss-imbalance-and-uneven-convergence>
+For the composite PINN loss in Eq.~@eq:pinn_loss_compact, Wang et
+al.~@wang2021gradientpathologies showed that the PDE, boundary, initial,
+and data terms can induce highly imbalanced gradients, so that some
+objectives dominate training while others make little progress. To
+diagnose this imbalance, they compared the gradient magnitudes
+contributed by different loss terms and proposed adaptively balancing
+each non-PDE term
+$cal(L)_i in { cal(L)_(upright(B C))\,cal(L)_(upright(I C))\,cal(L)_(upright(d a t a)) }$
+against the PDE term:
+$ hat(lambda)_i = frac(max #scale(x: 120%, y: 120%)[\|] nabla_theta cal(L)_(upright(P D E)) #scale(x: 120%, y: 120%)[\|], "mean" #scale(x: 120%, y: 120%)[\|] nabla_theta cal(L)_i #scale(x: 120%, y: 120%)[\|])\,#h(2em) lambda_i arrow.l\(1 - alpha\)lambda_i + alpha hat(lambda)_i\, $<eq:grad_pathology>
+where $alpha in\(0\,1\)$ is a smoothing factor. This observation reveals
+that PINN training can fail even when each individual loss term is well
+defined, because the composite objective may provide poorly balanced
+optimization signals.
+
+A complementary perspective comes from the neural tangent kernel (NTK)
+analysis. Wang et al.~@wang2022and showed that different components of
+the PINN objective can converge at substantially different rates during
+training. This suggests that the imbalance is not only a matter of
+manually chosen scalar weights or instantaneous gradient magnitudes, but
+is also tied to the spectrum of the training dynamics induced by the PDE
+operator and the neural parameterization. In other words, gradient
+imbalance is a local symptom of a broader convergence-rate mismatch
+among the physics and data constraints.
+
+=== Ill-conditioned residual optimization
+<ill-conditioned-residual-optimization>
+Krishnapriyan et al.~@krishnapriyan2021failuremodes further showed that
+failures on harder PDEs often arise not from limited expressivity, but
+from optimization difficulty and the brittleness of strong-form residual
+minimization. Their analysis can be viewed through objectives of the
+form
+$ min_theta #h(0em) cal(L)_u + lambda_r cal(L)_(upright(P D E))\, $<eq:failure_modes>
+where $cal(L)_u$ is shorthand for the supervision terms on the solution,
+including boundary, initial, and observed data terms. Their key
+observation is that simply increasing the PDE weight $lambda_r$ does not
+necessarily improve training: while a larger $lambda_r$ enforces physics
+more strongly, it can also make the optimization problem more
+ill-conditioned. Related loss-landscape analyses similarly show that
+differential operators in the residual term can produce poorly
+conditioned objectives, making PINN training sensitive to optimizer
+choice and hyperparameter settings~@rathore2024challenges.
+
+=== Representation and frequency bias
+<representation-and-frequency-bias>
+Another diagnosis concerns the representation bias of the neural network
+itself. Standard fully connected networks tend to learn smooth,
+low-frequency components more easily than high-frequency or multi-scale
+structures. Wang et al. ~@wang2021eigenvector connected this behavior to
+the eigenspectrum of the limiting NTK and showed that conventional PINNs
+can struggle when the target solution contains sharp spatial or temporal
+variations. Thus, even when the PDE residual is correctly specified, the
+neural parameterization and its optimization dynamics may bias training
+away from the physically relevant solution.
+
+=== Takeaway
+<takeaway>
+Together, these diagnoses show that naive composite PINN losses can fail
+for several intertwined reasons: different loss terms may generate
+imbalanced or conflicting gradients, the residual objective may be
+ill-conditioned, and the neural parameterization may favor smooth
+low-frequency solutions over the multi-scale structures required by the
+PDE. These observations motivate the remedy strategies discussed next,
+which aim to rebalance, resample, schedule, or better optimize the
+physics-informed objective.
+
+== Diagnosis: why naive composite losses fail
+<diagnosis-why-naive-composite-losses-fail>
+Subsequent work showed that the challenge of data/loss-embedded physics
+lies not only in formulating the objective, but also in optimizing it
+reliably. For the composite PINN loss in Eq.~@eq:pinn_loss_compact, Wang
+et al.~@wang2021gradientpathologies showed that the PDE, boundary,
+initial, and data terms can induce highly imbalanced gradients, so that
+some objectives dominate training while others make little progress. To
+diagnose and mitigate this issue, they proposed adaptively balancing
+each non-PDE term
+$cal(L)_i in { cal(L)_(upright(B C))\,cal(L)_(upright(I C))\,cal(L)_(upright(d a t a)) }$
+against the PDE term:
+$ hat(lambda)_i = frac(max #scale(x: 120%, y: 120%)[\|] nabla_theta cal(L)_(upright(P D E)) #scale(x: 120%, y: 120%)[\|], #scale(x: 120%, y: 120%)[\|] nabla_theta cal(L)_i #scale(x: 120%, y: 120%)[\|])\,#h(2em) lambda_i arrow.l\(1 - alpha\)lambda_i + alpha hat(lambda)_i\, $<eq:grad_pathology>
+where $alpha in\(0\,1\)$ is a smoothing factor. Krishnapriyan et
+al.~@krishnapriyan2021failuremodes further showed that failures on
+harder PDEs often arise not from limited expressivity, but from
+optimization difficulty and the brittleness of strong-form residual
+minimization itself. Their analysis can be viewed through objectives of
+the form
+$ min_theta #h(0em) cal(L)_u + lambda_r cal(L)_(upright(P D E))\, $<eq:failure_modes>
+where $cal(L)_u$ is shorthand for the supervision terms on the solution,
+including boundary, initial, and observed data terms. Their key
+observation is that simply increasing the PDE weight $lambda_r$ does not
+necessarily improve training: while a larger $lambda_r$ enforces physics
+more strongly, it can also make the optimization landscape more
+ill-conditioned. To make this more manageable, they explored curriculum
+regularization, which schematically replaces the target PDE loss by a
+sequence of progressively harder PDE losses,
+$ min_theta #h(0em) cal(L)_u + lambda_r cal(L)_(upright(P D E))^(\(s\))\,#h(2em) s = 1\,dots.h\,S\, $<eq:curriculum_pde>
+where $s$ indexes the curriculum stage and $S$ is the total number of
+stages. Intuitively, the curriculum does not change the overall
+formulation, but makes the PDE part of the objective easier to optimize
+in early stages.
+
+== Remedies
+<remedies>
+The above failure modes have motivated a broad family of remedies,
+summarized in Table~@tab:pinn_remedies. To connect these methods with
+the failure modes discussed above, we organize them according to which
+part of the PINN pipeline they modify: loss balancing and optimization,
+residual sampling and curriculum design, neural representation and
+architecture, constraint enforcement, and domain decomposition. This
+taxonomy also highlights a useful distinction: some methods directly
+reshape the composite optimization objective, while others improve the
+sampling strategy, the neural trial space, the enforcement of physics
+constraints, or the scalability of the solver.
+
+=== Loss balancing and optimization
+<loss-balancing-and-optimization>
+#strong[Loss balancing.] A first class of remedies addresses the
+composite PINN objective itself. Because the PDE residual, boundary
+conditions, initial conditions, and data terms can have very different
+magnitudes and gradient scales, fixed loss weights may cause some
+objectives to dominate training while others make little progress.
+Gradient-flow analyses therefore proposed adaptive weighting rules based
+on the gradient statistics of different loss terms
+@wang2021gradientpathologies. Related NTK-based analyses further showed
+that different components of the PINN loss can converge at different
+rates, motivating dynamic weights that balance the training dynamics of
+multiple physics constraints @wang2022and. More recent loss-balancing
+methods such as ReLoBRaLo formulate this issue as a multi-objective
+balancing problem and adjust weights according to relative training
+progress @bischof2025multi.
+
+Self-Adaptive PINNs~@mcclenny2023self address the same general issue
+from a point-wise residual-weighting perspective. Instead of assigning a
+fixed penalty to each collocation point, they introduce trainable
+adaptive weights:
+$ cal(L)_(upright(P D E)) = sum_j w_j thin r_theta\(upright(bold(x))_j\,t_j\)^2\, $<eq:adaptive_weight_compact>
+where $w_j$ is the adaptive importance weight for the residual at
+collocation point $\(upright(bold(x))_j\,t_j\)$. The network parameters
+are optimized to minimize the loss, while the weights are encouraged to
+increase on hard points with large residuals. As a result, the method
+automatically allocates more optimization effort to regions where the
+PDE is most strongly violated.
+
+#strong[Optimization.] Beyond weighting, optimizer design is also
+central to PINN training. Recent loss-landscape studies show that PINN
+objectives can be highly ill-conditioned, partly because differential
+operators amplify certain directions in parameter
+space~@rathore2024challenges. This explains why second-order or
+quasi-second-order optimizers such as L-BFGS~@liu1989limited, NysNewton
+CG~@rathore2024challenges, and SOAP-style preconditioning
+@wanggradient@vyas2025soap can substantially improve training stability.
+Schematically, such methods precondition the gradient update as
+$ theta_(t + 1) approx theta_t - eta H^(- 1) g_t\, $<eq:preconditioned_update>
+where $theta_t$ denotes the model parameters, $g_t$ is the total
+gradient, $eta$ is the learning rate, and $H$ denotes a curvature matrix
+or its approximation. Intuitively, curvature-aware preconditioning
+rescales poorly conditioned directions and can implicitly reduce
+conflicts among the gradients induced by different loss terms. These
+methods correspond to the first block of Table~@tab:pinn_remedies, which
+focuses on improving how the composite PINN objective is weighted and
+optimized.
+
+=== Residual sampling and causal curricula
+<residual-sampling-and-causal-curricula>
+#strong[Residual sampling.] A second class of remedies changes the
+distribution and order of physics supervision. In standard PINNs,
+collocation points are often sampled uniformly from the spatio-temporal
+domain. However, uniform sampling can waste many residual points in
+regions that are already well learned, while undersampling difficult
+regions with large PDE violations. Residual-based adaptive refinement
+methods, including RAR, RAD, and RAR-D, therefore update the sampling
+distribution according to the current residual @wu2023comprehensive:
+$ p\(upright(bold(x))\,t\)prop phi.alt #h(-1em) (lr(|r_theta \( upright(bold(x)) \, t \)|))\, $<eq:rad_sampling>
+where $p\(upright(bold(x))\,t\)$ denotes the sampling density and
+$phi.alt\(dot.op\)$ is a monotone function of the residual magnitude.
+This shifts collocation points toward regions where the current PINN
+violates the governing equation most strongly. Region-optimized PINNs
+further refine this idea by optimizing the spatial allocation of
+residual points more explicitly @wu2024ropinn. In this sense, adaptive
+sampling improves where physics is enforced, rather than changing the
+PDE loss itself.
+
+#strong[Causality-aware sampling.] For time-dependent problems, another
+important issue is temporal ordering. If residuals from all time steps
+are optimized simultaneously, errors from early times can propagate
+forward and make long-time prediction difficult. Causality-aware
+training addresses this problem by decomposing the temporal domain into
+chunks and weighting later chunks according to the accuracy of earlier
+ones @wang2024respecting:
+$ cal(L)_(upright(P D E))\(theta\)= 1 / N_t sum_(i = 1)^(N_t) omega_i thin cal(L)_(upright(P D E))^(\(i\))\(theta\)\, $<eq:causal_loss>
+where $N_t$ is the number of temporal chunks,
+$cal(L)_(upright(P D E))^(\(i\))$ is the residual loss on the $i$-th
+time slab, and $omega_i$ is a causal weight. The weights are designed so
+that later times receive significant penalty only after earlier-time
+residuals have been sufficiently reduced. Curriculum-based methods such
+as CoPINN extend this idea by explicitly organizing training from easier
+to harder residual constraints @duan2025copinn. Together, the second
+block of Table~@tab:pinn_remedies summarizes methods that improve where,
+when, and in what order residual supervision is imposed.
+
+=== Representation and architecture
+<representation-and-architecture>
+A third class of remedies addresses PINN failures from the perspective
+of neural representation. Standard coordinate-based MLPs often suffer
+from spectral bias, which makes them learn low-frequency components more
+easily than high-frequency or multi-scale structures. This is
+problematic for PDEs with sharp gradients, oscillatory solutions,
+boundary layers, or multi-scale dynamics. Fourier feature embeddings
+directly target this limitation by reshaping the coordinate
+representation and mitigating eigenvector bias in multi-scale PDEs
+@wang2021eigenvector. Similarly, sinusoidal activations provide a neural
+representation better suited for high-frequency implicit functions
+@sitzmann2020implicit, while locally adaptive activation functions
+introduce learnable activation slopes to accelerate convergence
+@jagtap2020locally. These methods do not directly modify the physics
+loss, but they make the neural trial space better matched to the target
+solution.
+
+More recent architectural remedies redesign the PINN backbone itself.
+SPINN uses separable network structures to improve efficiency,
+particularly through more efficient forward-mode automatic
+differentiation @cho2023separable. PINNsformer instead introduces a
+Transformer-based architecture to model sequential dependencies in
+physics-informed learning @zhao2024pinnsformer. These methods correspond
+to the representation and architecture block of
+Table~@tab:pinn_remedies: they are most useful when the difficulty comes
+not only from loss imbalance or sampling, but also from a mismatch
+between a simple MLP and the structure of the PDE solution.
+
+=== Constraint enforcement
+<constraint-enforcement>
+A fourth class of remedies modifies how boundary, initial, and physical
+constraints are imposed. In standard PINNs, boundary and initial
+conditions are usually enforced as soft penalty terms in the loss. This
+introduces additional loss-balancing difficulty: if the penalty is too
+small, the constraints may be violated; if it is too large, the PDE
+residual may be under-optimized. Classical hard-constrained neural trial
+functions address this issue by constructing solutions that satisfy
+prescribed constraints by design @lagaris1998artificial. A typical form
+is
+$ u_theta\(upright(bold(x))\,t\)= g\(upright(bold(x))\,t\)+ d\(upright(bold(x))\,t\)N_theta\(upright(bold(x))\,t\)\, $<eq:hard_constraint_ansatz>
+where $g\(upright(bold(x))\,t\)$ satisfies the prescribed constraint,
+$d\(upright(bold(x))\,t\)$ vanishes on the constrained boundary, and
+$N_theta$ is the trainable neural network. Since the constraint is built
+into the solution form, the optimizer no longer needs to enforce it only
+through a soft penalty weight. Modern PINN libraries and formulations
+further implement such hard constraints using approximate distance
+functions and geometry-aware output transformations @lu2021deepxde.
+Recent work also studies soft and hard boundary constraints for specific
+PDE families such as advection--diffusion equations @li2024physical.
+
+Variational and weak-form PINNs provide another way to improve
+constraint and residual enforcement. Instead of directly minimizing the
+point-wise strong-form PDE residual, these methods enforce the governing
+equation against test functions in an integral form. hp-VPINNs combine
+this variational formulation with hp-refinement and domain
+decomposition, improving the connection between PINNs and classical
+finite-element or Galerkin methods @kharazmi2021hpvpinn. Thus, the
+constraint-enforcement block of Table~@tab:pinn_remedies captures two
+related strategies: satisfying constraints by construction and replacing
+strong-form residuals with weak-form or variational objectives.
+
+=== Domain decomposition and scalability
+<domain-decomposition-and-scalability>
+Finally, a fifth class of remedies improves PINNs by localizing the
+learning problem. Instead of fitting a single global network over the
+entire spatio-temporal domain, domain-decomposition methods divide the
+domain into subregions and train local networks coupled through
+interface, conservation, or partition-of-unity constraints. Conservative
+PINNs impose interface flux continuity for conservation laws
+@jagtap2020conservative, while XPINNs generalize this idea to flexible
+space-time domain decomposition for nonlinear PDEs @jagtap2020extended.
+FBPINNs further introduce overlapping subdomains and partition-of-unity
+weighting to make the decomposition more scalable and localized
+@moseley2023finite.
+
+Recent extensions improve the scalability and adaptivity of this
+decomposition view. Multilevel FBPINNs introduce hierarchical
+decompositions to improve global communication across subdomains
+@dolean2024multilevel, while AB-PINNs use residual-driven adaptive bases
+to dynamically allocate decomposition capacity @botvinick2025ab. These
+methods correspond to the final block of Table~@tab:pinn_remedies. They
+are especially useful for heterogeneous, multi-scale, or long-time
+problems where a single global PINN is difficult to optimize.
+
+=== Summary
+<summary-1>
+Overall, the remedies in Table~@tab:pinn_remedies show that PINN
+performance is determined not only by whether the correct physical
+equations are included in the objective, but also by whether the
+resulting learning problem is numerically trainable. Loss balancing and
+second-order optimization improve how competing objectives are
+minimized; adaptive sampling and causal curricula improve where and when
+residuals are enforced; representation and architectural methods improve
+what functions the network can express; hard constraints and weak forms
+improve how physics is encoded; and domain decomposition improves
+scalability to complex physical systems.
+
+#figure(
+[
+  #show table.cell: set text(size: 6pt)
+  #set table.hline(stroke: (dash: "solid", thickness: 0.5pt))
+
+  #table(
+  columns: (1fr, auto, auto, auto),
+  stroke: none,
+  align: left + horizon,
+  inset: 2pt,
+
+  table.header[*Type*][*Method*][*Venue / Year*][*Keyword-style Contribution*],
+
+  table.hline(),
+  table.cell(rowspan: 6)[*Loss balancing and optimization*],
+
+  [Gradient-flow weighting~],
+  [SISC 2021],
+  [Adaptive loss weighting based on gradient statistics across PDE, boundary, and data terms.],
+
+  [NTK-based weighting~],
+  [JCP 2022],
+  [Balances different physics constraints through neural tangent kernel training dynamics.],
+
+  [SA-PINNs~],
+  [JCP 2023],
+  [Learns adaptive residual weights to emphasize difficult collocation points.],
+
+  [Loss-landscape / NysNewton-CG~],
+  [ICML 2024],
+  [Studies PINN ill-conditioning and improves training with second-order optimization.],
+
+  [ReLoBRaLo~],
+  [CMAME 2025],
+  [Relative loss balancing with random lookback for multi-objective PINN training.],
+
+  [SOAP / gradient alignment~],
+  [NeurIPS 2025],
+  [Uses quasi-second-order preconditioning to improve gradient alignment in composite PINN objectives.],
+
+  table.hline(),
+  table.cell(rowspan: 4)[*Residual sampling and curriculum*],
+
+  [RAR / RAD / RAR-D~],
+  [CMAME 2023],
+  [Residual-based adaptive refinement and distribution-based collocation sampling.],
+
+  [RoPINN~],
+  [NeurIPS 2024],
+  [Region-optimized residual sampling for more efficient collocation point selection.],
+
+  [Causal PINN training~],
+  [CMAME 2024],
+  [Causality-aware temporal segmentation and residual reweighting for time-dependent PDEs.],
+
+  [CoPINN~],
+  [ICML 2025],
+  [Cognitive easy-to-hard curriculum training for progressively enforcing difficult residuals.],
+
+  table.hline(),
+  table.cell(rowspan: 5)[*Representation and architecture*],
+
+  [Fourier features / eigenvector bias~],
+  [CMAME 2021],
+  [Multi-scale coordinate embeddings to mitigate spectral and eigenvector bias.],
+
+  [Adaptive activation functions~],
+  [Proc. R. Soc. A 2020],
+  [Learnable activation slopes and slope-recovery terms for faster convergence.],
+
+  [SIREN~],
+  [NeurIPS 2020],
+  [Sinusoidal activations for representing high-frequency implicit functions.],
+
+  [SPINN~],
+  [NeurIPS 2023],
+  [Separable network structure for efficient forward-mode automatic differentiation.],
+
+  [PINNsformer~],
+  [ICLR 2024],
+  [Transformer-based architecture for modeling sequential dependencies in PINNs.],
+
+  table.hline(),
+  table.cell( rowspan: 4)[*Constraint enforcement*],
+
+  [Approximate distance functions~],
+  [SIAM Review 2021],
+  [Implements hard constraints using distance functions and geometry-aware output transformations.],
+
+  [hp-VPINN~],
+  [CMAME 2021],
+  [Variational weak-form PINNs with hp-refinement and domain decomposition.],
+
+  [Hard initial/boundary constraints~],
+  [CMA 2024],
+  [Enforces prescribed initial and boundary conditions through constrained solution forms.],
+
+  [cPINN~],
+  [CMAME 2020],
+  [Conservative domain decomposition with interface flux continuity for conservation laws.],
+
+  table.hline(),
+  table.cell( rowspan: 4)[*Domain decomposition and scalability*],
+
+  [XPINN~],
+  [CCP 2020],
+  [General space-time domain decomposition for heterogeneous PDE problems.],
+
+  [FBPINN~],
+  [ACOM 2023],
+  [Overlapping subdomains with partition-of-unity weighting for localized training.],
+
+  [Multilevel FBPINN~],
+  [CMAME 2024],
+  [Hierarchical domain decomposition for improved global communication and scalability.],
+
+  [AB-PINN~],
+  [arXiv 2025],
+  [Adaptive residual-driven decomposition for dynamically allocating subdomains.],
+)
+]
+) <tab:pinn_remedies>
+
+#bibliography("part-1.bib")
diff --git a/prompt-00-outline.md b/prompt-00-outline.md
new file mode 100644
index 0000000..83dfe59
--- /dev/null
+++ b/prompt-00-outline.md
@@ -0,0 +1,99 @@
+I have reviewed a number of papers relating to physics-informed machine
+learning principles. 
+
+- Deep Tensor ADMM-Net for Snapshot Compressive Imaging
+- End-to-End Optimization of Optics and Image Processing
+- NeRF Basics
+- Implicit Surfaces via Volume Rendering
+- Continuum-aware NeRF (PAC-NeRF)
+- NeRF in Scattering Media
+- Lens Design with Differentiable Ray Tracing 
+- Hybrid Lens Design with Differentiable Wave Optics
+- Diffractive Deep Neural Networks
+- Spatially Varying Nanophotonic Neural Networks
+- 3D Gaussian Splatting
+- Physics Integrated Gaussians
+- Intro to Graph Neural Networks
+- Interaction Networks for Learning Physics
+- GNNs as Learnable Physics Engines
+- Graph-based Physics Simulators
+- Deep Image Prior
+- GNNs and Generative Priors for Solving Inverse Problems
+- Invertible Generative Models
+- Diffusion Posterior Sampling
+
+Broadly, I have categorized these papers into four groups:
+
+- Data and Loss embedded physics. Physical constraints are applied in the
+  training data and/or loss functions; any physical accuracy in the results is
+  implicitly learned from these.
+- Architecture embedded physics. Physical constraints are applied in the model
+  architecture. Typically this involves some sub-stage of the model which
+  decodes a latent vector, applies some analytical physical formulae to it,
+  then re-encodes it to a new latent vector for downstream processing. For
+  example PAC-NeRF.
+- Operator embedded physics. The machine learning model does not directly
+  produce outputs, but rather it produces some state representation which is
+  then processed by an analytical physical model. For example differentiable
+  implicit renderers as in NeRF or Gaussian Splatting.
+- System embedded physics. Some stage of the model involves a hardware physical
+  step, such as optical neural networks or robotic feedback mechanisms.
+
+Synthesize a rough outline for a literature review paper which explores,
+generalizes, and unifies these four categories of Physics-Informed Neural
+Networks (PINNs).
+
+In particular, for each category, provide:
+
+- Brief introduction of the category
+- Unified formulation of architectures within the category
+- List of criteria to search for papers which describe models of the category
+- A coarse outline of that section of the review paper.
+
+We have already built some notions for unified formulation and outline for the
+first category, data- and loss-embedded physics, listed below.
+
+(note that we have not yet expressed robotic feedback mechanisms in this common
+formulation)
+
+---
+
+The driving formulation for all these models is that there is some general
+time-dependent nonlinear PDE of the form:
+
+d_t u(x, t) + N_u(x, t) = 0
+
+where N denotes a possibly nonlinear spatial differential operator, and u is
+the unknown physical field. The neural network u_theta approximates u, and from
+this we can solve inverse problems or forward inference.
+
+PINNs define a physics residual by substituting the neural approximation u_theta into the governing equation:
+
+r_theta(x, t) = d_t u_theta(x, t) + N_u_theta(x, t)
+
+The governing equation is satisfied at a point (x, t) when r_theta(x, t) = 0,
+so the physics loss penalizes based on this residual at various observations
+(x_j, t_j).
+
+That is, we optimize for the model parameters theta by
+
+argmin_theta | N_theta(x, t) - u_theta(x, t) |^2_2
+
+In architecture embedded physics, the model approximation `u_theta`
+incorporates some physics-based differential possibly nonlinear operator
+`partial` which informs the result.
+
+argmin_theta | N_theta(partial, x, t) - u_theta(partial, x, t) |^2_2
+
+In operator-embedded physics physics, the differential operator is applied to
+the model result.
+
+argmin_theta | partial(N_theta, x, t) - partial(u_theta, x, t) |^2_2
+
+Finally, in system-embedded physics, the model parameters and model are
+expressed in such a way that inference can be realized by a physical system.
+For example, of optical neural networks, the parameters theta are expressed in
+terms of diffraction gratings or nanophotonics. In this sense, the parameters
+are optimized by operator-embedded physics where partial is a free-space wave
+propagation operator.
+
diff --git a/prompt-01-coarse-research.md b/prompt-01-coarse-research.md
new file mode 100644
index 0000000..46ac09d
--- /dev/null
+++ b/prompt-01-coarse-research.md
@@ -0,0 +1,124 @@
+I have reviewed a number of papers relating to physics-informed machine
+learning principles. 
+
+- Deep Tensor ADMM-Net for Snapshot Compressive Imaging
+- End-to-End Optimization of Optics and Image Processing
+- NeRF Basics
+- Implicit Surfaces via Volume Rendering
+- Continuum-aware NeRF (PAC-NeRF)
+- NeRF in Scattering Media
+- Lens Design with Differentiable Ray Tracing 
+- Hybrid Lens Design with Differentiable Wave Optics
+- Diffractive Deep Neural Networks
+- Spatially Varying Nanophotonic Neural Networks
+- 3D Gaussian Splatting
+- Physics Integrated Gaussians
+- Intro to Graph Neural Networks
+- Interaction Networks for Learning Physics
+- GNNs as Learnable Physics Engines
+- Graph-based Physics Simulators
+- Deep Image Prior
+- GNNs and Generative Priors for Solving Inverse Problems
+- Invertible Generative Models
+- Diffusion Posterior Sampling
+
+Broadly, I have categorized these papers into four groups, listed below.
+
+Review the literature and identify, for each category, a dozen or so papers.
+These papers must be current (or expository) and be representative of the
+strengths and weaknesses of the state-of-the-art in each of these categories.
+
+---
+
+## I. Data- and Loss-Embedded Physics
+### Brief Introduction
+This category represents the most common paradigm of Physics-Informed Neural Networks (PINNs). In these models, the neural network architecture itself remains a standard black-box (e.g., an MLP), but physical laws are introduced as "soft constraints." The model implicitly learns physical accuracy because violations of governing analytical equations (like PDEs) are heavily penalized during training via the loss function, or because the training data itself is heavily curated by physical simulators.
+
+### Unified Formulation
+The driving formulation is that there is some general time-dependent nonlinear PDE:
+$$\partial_t u(x, t) + \mathcal{N}[u](x, t) = 0$$
+where $\mathcal{N}$ denotes a spatial differential operator and $u$ is the unknown physical field. The neural network $u_\theta$ approximates $u$. We define a physics residual by substituting $u_\theta$ into the governing equation:
+$$r_\theta(x, t) = \partial_t u_\theta(x, t) + \mathcal{N}[u_\theta](x, t)$$
+Because the governing equation is satisfied when $r_\theta(x, t) = 0$, the physics loss penalizes this residual alongside data observations. Utilizing your framework's notation, we optimize the model parameters $\theta$ by minimizing the discrepancy:
+$$\arg\min_\theta \| \mathcal{N}_\theta(x, t) - u_\theta(x, t) \|_2^2$$
+
+### Search Criteria
+To find literature in this category, filter for papers discussing:
+* **Keywords:** Physics-Informed Neural Networks (PINNs), soft constraints, PDE residual loss, physics-guided machine learning, deep image prior, regularization via physics.
+* **Methodology:** Models that use standard MLPs or CNNs but modify the training regime or loss landscape (e.g., adding a term to Mean Squared Error that computes gradients using `Autograd` to enforce physical laws).
+
+### Coarse Outline
+1.  **Introduction to PINNs:** The shift from purely data-driven to physics-guided learning.
+2.  **Formulating the Physics Loss:** Calculating derivatives and residuals via automatic differentiation.
+3.  **Forward vs. Inverse Problems:** Using data/loss-embedded models to discover unknown PDE parameters versus simulating known PDEs.
+4.  **Generative Priors:** Deep Image Prior and Diffusion Posterior Sampling as implicit data-driven physical regularizers.
+5.  **Limitations:** The "soft constraint" problem (models can still output physically impossible results if the loss isn't perfectly balanced) and optimization failures (stiff PDEs).
+
+---
+
+## II. Architecture-Embedded Physics
+### Brief Introduction
+Architecture-embedded physics moves from "soft constraints" to "hard constraints." Instead of relying on the loss function to penalize non-physical behavior, the physical constraints are baked directly into the neural network's topology or internal operations. The model decodes latent vectors, applies analytical physics formulations internally, and re-encodes them, ensuring the output natively respects invariances, symmetries, or conservation laws.
+
+### Unified Formulation
+The model approximation $u_\theta$ incorporates some physics-based, possibly nonlinear, differential operator $\partial$ that structurally dictates the result. The internal forward pass is strictly bound by this operator:
+$$\arg\min_\theta \| \mathcal{N}_\theta(\partial, x, t) - u_\theta(\partial, x, t) \|_2^2$$
+
+### Search Criteria
+To find literature in this category, filter for papers discussing:
+* **Keywords:** Hard constraints, physics-encoded architecture, invariant/equivariant neural networks, Graph Neural Networks (GNNs) for physics, Hamiltonian Neural Networks, PAC-NeRF.
+* **Methodology:** Models containing custom layers, inductive biases, message-passing topologies matching physical interactions (like GNNs modeling particle systems), or internal latent physics solvers.
+
+### Coarse Outline
+1.  **Transitioning to Hard Constraints:** Overcoming the optimization challenges of standard PINNs.
+2.  **Graph-based Physics Simulators:** How GNNs naturally mirror $N$-body interactions and physical meshes (e.g., Interaction Networks).
+3.  **Latent Physics Solvers:** Embedding physical formulae in the latent space (e.g., PAC-NeRF incorporating continuum mechanics into the NeRF MLP).
+4.  **Symmetry and Equivariance:** Ensuring physical laws (like rotation or translation invariance) are structurally guaranteed.
+5.  **Trade-offs:** The balance between expressivity (neural capacity) and strict adherence to physical priors.
+
+---
+
+## III. Operator-Embedded Physics
+### Brief Introduction
+In operator-embedded physics, the neural network acts as a state generator rather than a direct solution generator. The ML model outputs a continuous or discrete state representation (like a radiance field, volume density, or Gaussian splat parameters). This state is then passed through an external, fixed analytical physical operator (like a differentiable renderer) to produce the final output. The physical simulator is the lens through which the neural network learns.
+
+### Unified Formulation
+The differential or physical operator $\partial$ (e.g., volume rendering integral, ray tracing) is applied *after* the model result, to the model's state representation. The loss is computed on the output of this operator:
+$$\arg\min_\theta \| \partial(\mathcal{N}_\theta, x, t) - \partial(u_\theta, x, t) \|_2^2$$
+
+### Search Criteria
+To find literature in this category, filter for papers discussing:
+* **Keywords:** Differentiable rendering, differentiable physics engines, implicit neural representations (INRs), NeRF, 3D Gaussian Splatting, inverse graphics, differentiable wave optics.
+* **Methodology:** Systems where an MLP predicts properties (color, density, index of refraction) at coordinates, which are subsequently integrated or propagated via classical physics equations.
+
+### Coarse Outline
+1.  **The State-Representation Paradigm:** Decoupling the neural representation from the physical observation.
+2.  **Differentiable Simulators:** How standard physics engines (ray tracers, wave propagators) are made differentiable to allow backpropagation to the neural state.
+3.  **Implicit Neural Representations:** NeRFs, Gaussian Splatting, and volume rendering as prime examples of operator-embedded physics.
+4.  **Optical Inverse Problems:** Lens design and scattering media models utilizing differentiable wave optics.
+
+---
+
+## IV. System-Embedded Physics
+### Brief Introduction
+System-embedded physics takes the computational graph off the traditional silicon chip and places it into the physical world. Here, inference is realized by physical hardware. The trainable parameters are physical objects (e.g., the thickness of a nanophotonic lens, the diffraction grating, or robotic actuators). The physical world itself computes the forward pass.
+
+### Unified Formulation
+The parameters $\theta$ represent physical properties, optimized via operator-embedded principles (where $\partial$ is the natural physical environment, such as free-space wave propagation or Newtonian mechanics).
+For optical networks, optimization mirrors Category III. For robotic feedback mechanisms (which you noted needed formulation), the operator $\partial$ acts as the real-world state transition dynamics given a physically embedded action $a_\theta$:
+$$\text{State}_{t+1} = \partial(\text{State}_t, a_\theta)$$
+The optimization minimizes the discrepancy between the physical system's true hardware performance and the desired objective:
+$$\arg\min_\theta \| \partial_{\text{hardware}}(\mathcal{N}_\theta) - \text{Target} \|_2^2$$
+
+### Search Criteria
+To find literature in this category, filter for papers discussing:
+* **Keywords:** Optical neural networks, diffractive deep neural networks, nanophotonic neural networks, hardware-in-the-loop optimization, embodied AI, neuromorphic photonics, physical neural networks.
+* **Methodology:** End-to-end optimization of physical sensors, meta-optics, analog physical computing, or robots using differentiable reality or real-world feedback to update hardware-level parameters.
+
+### Coarse Outline
+1.  **From Software to Hardware:** The physical world as the ultimate physics engine.
+2.  **Optical and Photonic Neural Networks:** Diffractive layers and spatially varying nanophotonics functioning as neural weights.
+3.  **Robotic and Embodied Feedback:** Hardware-in-the-loop training and optimizing physical control parameters.
+4.  **End-to-End Co-design:** Simultaneously optimizing the physical hardware ($\theta$) alongside downstream digital image processing.
+5.  **The Future of PIML:** Analog computing and overcoming the memory/power bottlenecks of traditional digital GPUs.
+
diff --git a/prompt-02-detail-research.md b/prompt-02-detail-research.md
new file mode 100644
index 0000000..6ab856e
--- /dev/null
+++ b/prompt-02-detail-research.md
@@ -0,0 +1,47 @@
+Next, I want to expand on the second catogory, **II. Architecture-Embedded
+Physics: Hard Constraints and Geometric Deep Learning**. I want to examine each
+of the referenced papers and perform further subcategorization.
+
+- DeepH-E3 (2023) 16 | Equivariant DFT Hamiltonian
+- NequIP / MACE (2021/2024) 15 | E(3)-Equivariant Potentials
+- QHNet (2023) 19 | Efficient SE(3)-Equivariance
+- Neural Hamiltonian Diffusion (2025) 20 | Manifold Hamiltonian Learning
+- Timrov et al. (2025) 21 | Hubbard Parameter ENN
+- SpinGNN (2025) 17 | Heisenberg/Spin-Lattice GNN
+- ACE Framework (2024) 15 | Atomic Cluster Expansion
+- AI2DFT (2024) 22 | Differential DFT Neural Code
+- Deep Potentials (2021) 17 | Density-based Descriptors
+- Heisenberg Edge GNN 17 | Equivariant Message Passing
+- Atomic-site ENN (2024) 15 | Lattice Symmetry-Aware
+- Group-Equivariant Survey 18 | Group Representation Theory
+
+For each of these papers, identify the core architectural insight. Try to find
+common subcategories, and for each subcategory, express it in terms of our
+unified formulation.
+
+The driving formulation for all these models is that there is some general
+time-dependent nonlinear PDE of the form:
+
+d_t u(x, t) + N_u(x, t) = 0
+
+where N denotes a possibly nonlinear spatial differential operator, and u is
+the unknown physical field. The neural network u_theta approximates u, and from
+this we can solve inverse problems or forward inference.
+
+PINNs define a physics residual by substituting the neural approximation u_theta into the governing equation:
+
+r_theta(x, t) = d_t u_theta(x, t) + N_u_theta(x, t)
+
+The governing equation is satisfied at a point $(x, t)$ when $r_theta(x, t) = 0$,
+so the physics loss penalizes based on this residual at various observations
+(x_j, t_j).
+
+That is, we optimize for the model parameters theta by
+
+argmin_theta | N_theta(x, t) - u_theta(x, t) |^2_2
+
+In architecture embedded physics, the model approximation `u_theta`
+incorporates some physics-based differential possibly nonlinear operator
+`partial` which informs the result.
+
+argmin_theta | N_theta(partial, x, t) - u_theta(partial, x, t) |^2_2