Spring 2026

2026-05-25 11:31:33 -04:00
commit 93bfee7eef
13 changed files with 3309 additions and 0 deletions
--- a/part-1.typ
+++ b/part-1.typ
@@ -0,0 +1,646 @@
+#set math.equation(numbering: "1")
+
+= Writing Plan
+<writing-plan>
+\1. Introduction
+
+\2. Theoretical Foundations of Physics in Visual Computing
+
+\3. Physics-Informed and Physics-Embedded Neural Methods You can pick
+one of the following topics
+
+- Data and Loss-embedded physics: PDE residual losses, initial-value and
+  boundary-value constraints, other soft constraints, etc. --Han
+
+- Architecture embedded physics: hard-coded invariances, physically
+  parameterized layers, analytic kernels inside networks, etc.-- David
+
+- Operator embedded physics: differentiable renderers, wave propagation
+  and light transport operators, neural and fourier neural operators,
+  etc. --Andrea
+
+- System embedded physics: hardware in the loop, ONNs, etc. --zhen
+
+- Applications of PINNs --Ana
+
+\4. Failure Modes and Misconceptions (Where do these methods work and do
+not work)
+
+\5. Open Problems and Future Directions
+
+\6. Discussion and Conclusion
+
+= Data and Loss-embedded Physics
+<sec:data_loss_embedded_physics>
+== Background
+<background>
+Physics-Informed Neural Networks (PINNs) combine traditional
+physics-based simulation with deep learning. Classical methods such as
+the Finite Element Method (FEM)~@courant1994variational and Finite
+Volume Method (FVM)~@leveque2002finite@patankar2018numerical solve
+physical equations by first dividing the physical domain into many small
+computational cells, like covering the space with a fine grid. These
+methods are accurate and reliable, but they can be expensive for complex
+geometries, moving boundaries, high-dimensional problems, or repeated
+simulations in design. Pure deep learning can be faster, but if it only
+learns from data, it may violate basic physical laws such as
+conservation of mass, momentum, or energy, making it unreliable when
+data are limited or test cases differ from training examples.
+
+PINNs address this by incorporating physical laws directly into neural
+network training. Instead of only fitting observed data, the model is
+also penalized when its predictions do not satisfy the governing
+differential equations. As a result, PINNs can learn solutions that are
+both data-efficient and physically meaningful. They also avoid the need
+for a predefined computational grid: rather than solving only on a fixed
+grid, PINNs learn a continuous function over space and time and use
+automatic differentiation to check the physical equations at sampled
+points. This makes them useful for irregular shapes, changing domains,
+limited measurements, and inverse problems where hidden physical
+parameters need to be estimated.
+
+== Formulation: embedding physics into the objective
+<formulation-embedding-physics-into-the-objective>
+Data and loss-embedded physics is a foundational paradigm for
+incorporating physical knowledge into neural computation by encoding
+governing equations and physical constraints directly into the training
+objective. In this setting, a neural network
+$u_theta\(upright(bold(x))\,t\)$ approximates the unknown physical field
+$u\(upright(bold(x))\,t\)$, where $theta$ denotes the learnable
+parameters, $upright(bold(x)) in Omega$ denotes the spatial coordinate
+in the domain $Omega$, and $t in\[0\,T\]$ denotes time.
+
+Consider a general time-dependent nonlinear PDE of the form
+$ partial_t u\(upright(bold(x))\,t\)+ cal(N)\[u\]\(upright(bold(x))\,t\)= 0\,quad upright(bold(x)) in Omega\,quad t in\[0\,T\]\, $<eq:generic_pde>
+where $cal(N)\[dot.op\]$ denotes a possibly nonlinear spatial
+differential operator. PINNs~@raissi2019pinn define a physics residual
+by substituting the neural approximation $u_theta$ into the governing
+equation:
+$ r_theta\(upright(bold(x))\,t\):= partial_t u_theta\(upright(bold(x))\,t\)+ cal(N)\[u_theta\]\(upright(bold(x))\,t\). $<eq:pde_residual>
+The governing equation is satisfied at a point $\(upright(bold(x))\,t\)$
+when $r_theta\(upright(bold(x))\,t\)= 0$. Therefore, the PDE residual is
+penalized over a set of collocation points
+${\(upright(bold(x))_j\,t_j\)}_(j = 1)^(N_r)$:
+$ cal(L)_(upright(P D E)) = 1 / N_r sum_(j = 1)^(N_r) ∥r_theta \( upright(bold(x))_j \, t_j \)∥_2^2 . $<eq:pde_loss>
+
+The full training objective then combines the equation loss with
+supervision on the solution itself:
+$ cal(L) = underbrace(lambda_r cal(L)_(upright(P D E)), upright("equation / physics loss")) + #h(0em) underbrace((lambda_b cal(L)_(upright(B C)) + lambda_i cal(L)_(upright(I C)) + lambda_d cal(L)_(upright(d a t a))), upright("data / constraint loss")) . $<eq:pinn_loss_compact>
+Here, $cal(L)_(upright(B C))$, $cal(L)_(upright(I C))$, and
+$cal(L)_(upright(d a t a))$ measure violations of boundary conditions,
+initial conditions, and observed data, respectively, while
+$lambda_r\,lambda_b\,lambda_i\,lambda_d$ are scalar balancing weights.
+
+== Alternative formulations
+<alternative-formulations>
+The original PINN formulation enforces the #emph[strong-form] PDE
+residual $r_theta\(upright(bold(x))\,t\)$ toward zero #emph[pointwise].
+Many subsequent variants reformulate this objective to improve
+stability, reduce derivative requirements, or better match different
+classes of physical systems. For notation, we write
+$upright(bold(z)) =\(upright(bold(x))\,t\)$ and let
+$cal(D) = Omega times\[0\,T\]$ denote the space-time domain. These
+alternatives can be roughly grouped into the following categories.
+
+=== Variational/energy formulation
+<variationalenergy-formulation>
+Some PDEs admit an energy or variational principle, where the true
+solution is characterized as the minimizer of an integral functional,
+such as the Dirichlet energy. For example, the Deep Ritz
+method~@yu2018deepritz considers PDEs with variational formulations, in
+which the solution satisfies
+$ u^(*) = arg min_(u in cal(V)) cal(E)\(u\)\, $<eq:deep_ritz_variational>
+where $cal(V)$ denotes the admissible function space and
+$cal(E)\(dot.op\)$ denotes the corresponding energy functional. Instead
+of optimizing directly over the infinite-dimensional space $cal(V)$,
+Deep Ritz parameterizes the solution with a neural network $u_theta$ and
+solves $ theta^(*) = arg min_theta cal(E)\(u_theta\). $<eq:deep_ritz_nn>
+
+Similar to PINNs, energy-based methods still use a neural network to
+approximate the solution field itself. However, the training signal
+comes from minimizing a global energy functional $cal(E)$ rather than
+penalizing pointwise PDE residuals. In practice, the integral in
+$cal(E)$ can be estimated by Monte Carlo sampling over the physical
+domain, making the objective compatible with standard stochastic
+gradient optimization. Compared with residual-based PINNs, variational
+formulations often require lower-order derivatives of the network output
+and can more naturally preserve physical structures encoded by the
+energy, such as force balance, symmetry, or conservation-related
+constraints, without introducing separate penalty losses.
+
+Recent work has extended this idea to neural operators. For example,
+Variational PINO (VINO)~@eshaghi2025variational trains a neural operator
+by minimizing the PDE energy, achieving strong performance without
+labeled solution data.
+
+=== Weak formulations
+<weak-formulations>
+Another route is to enforce the PDE in an #emph[integrated] or
+#emph[weak] sense rather than pointwise. Instead of requiring the
+strong-form residual to vanish at individual collocation points,
+weak-form methods require the residual to vanish when tested against a
+set of test functions. For a set of test functions ${ v_k }_(k = 1)^K$,
+this can be written as
+$ cal(R)_theta\(v_k\):= integral_(cal(D)) r_theta\(upright(bold(z))\)thin v_k\(upright(bold(z))\)thin d upright(bold(z)) approx 0\,#h(2em) k = 1\,dots.h\,K\, $<eq:weak_residual>
+where $cal(R)_theta\(v_k\)$ denotes the weak residual associated with
+the test function $v_k$. In practice, weak formulations often integrate
+the PDE by parts, which transfers derivatives from the neural solution
+$u_theta$ to the test functions. This reduces the derivative order
+required from the neural network and can improve stability for irregular
+or non-smooth solutions.
+
+Variational Physics-Informed Neural Networks (VPINNs)~@kharazmi2019vpinn
+optimize a loss over such weak residuals:
+$ cal(L)_(upright(w e a k)) = 1 / K sum_(k = 1)^K lr(|cal(R)_theta \( v_k \)|)^2 . $<eq:vpinn_loss>
+Relative to standard PINNs, the key difference is that the PDE is
+enforced in an averaged integral sense rather than pointwise.
+
+hp-VPINNs~@kharazmi2021hpvpinn retain the same weak-form principle, but
+apply it locally over a partition of the domain
+$cal(D) = union.big_(e = 1)^(N_(upright(s d))) cal(D)_e$. The
+corresponding local weak-form loss can be written as
+$ cal(L)_(upright(h p)) = frac(1, N_(upright(s d)) K) sum_(e = 1)^(N_(upright(s d))) sum_(k = 1)^K lr(|cal(R)_theta^(\(e\)) \( v_k^(\(e\)) \)|)^2\, $<eq:hpvpinn_loss>
+where
+$ cal(R)_theta^(\(e\))\(v_k^(\(e\))\):= integral_(cal(D)_e) r_theta\(upright(bold(z))\)thin v_k^(\(e\))\(upright(bold(z))\)thin d upright(bold(z)) . $<eq:local_weak_residual>
+Here, $cal(D)_e$ denotes the $e$-th subdomain, $N_(upright(s d))$ is the
+number of subdomains, and $v_k^(\(e\))$ is a local test function on
+$cal(D)_e$. This local formulation makes refinement more flexible:
+$h$-refinement subdivides the domain more finely, while $p$-refinement
+increases the polynomial order of the local test space. As a result,
+hp-VPINNs can better resolve multi-scale or spatially heterogeneous
+solutions.
+
+A related line of work studies the choice of test space and residual
+norm. For example, Robust VPINNs~@rojas2024robust address the
+sensitivity of classical VPINNs to the test basis by minimizing
+residuals in a dual norm, leading to improved stability.
+
+=== Adversarial/Minimax formulations
+<adversarialminimax-formulations>
+Weak formulations can also be cast as saddle-point problems. A
+representative example is the Weak Adversarial Network
+(WAN)~@zang2020weak. Instead of choosing a fixed set of test functions,
+WAN parameterizes both the solution and the test function with neural
+networks: $u_theta$ for the solution and $phi_eta$ for the test
+function. The method then solves a minimax problem of the form
+$ min_theta max_eta #h(0em) cal(J)\(theta\,eta\)\, $<eq:wan_minimax>
+where $cal(J)\(theta\,eta\)$ measures the weak residual induced by the
+test network $phi_eta$. Intuitively, the solution network $u_theta$
+tries to minimize the residual, while the test network $phi_eta$ acts as
+an adversary that searches for regions or directions where the current
+solution still violates the PDE. Therefore, rather than enforcing the
+residual against a fixed test basis, WAN adaptively learns test
+functions that expose the remaining error.
+
+This adversarial weak-form perspective is especially useful when
+hand-designed test functions are insufficient or when the PDE is
+high-dimensional or non-smooth.
+
+=== Summary
+<summary>
+Together, these developments broaden the "formulation" stage of PINN
+research. They demonstrate that one can teach a network to respect a PDE
+either by driving a pointwise residual to zero (classical
+PINN~@raissi2019pinn), by minimizing an energy integral (Deep
+Ritz~@yu2018deepritz, VINO~@eshaghi2025variational), by enforcing
+weighted integral constraints (VPINN~@kharazmi2019vpinn,
+hp-VPINN~@kharazmi2021hpvpinn, WF-PINN~@wang2025wf, etc.), or even by
+solving a minimax problem (WAN~@zang2020weak). Each alternative has its
+own advantages: variational forms lower the required smoothness, weak
+forms improve stability on irregular solutions, and projection or flux
+methods enforce conservation exactly. The literature continues to evolve
+these ideas, offering a rich toolkit for physics-informed learning
+beyond the original PINN objective.
+
+== Diagnosis: why naive composite losses fail in PINN
+<diagnosis-why-naive-composite-losses-fail-in-pinn>
+Subsequent work showed that the challenge of data/loss-embedded physics
+lies not only in formulating the objective, but also in optimizing it
+reliably. For naive composite PINN losses, failures can arise from
+several intertwined sources: imbalanced gradients across loss terms,
+uneven convergence dynamics, ill-conditioned residual optimization, and
+representation bias in the neural network itself.
+
+=== Loss imbalance and uneven convergence
+<loss-imbalance-and-uneven-convergence>
+For the composite PINN loss in Eq.~@eq:pinn_loss_compact, Wang et
+al.~@wang2021gradientpathologies showed that the PDE, boundary, initial,
+and data terms can induce highly imbalanced gradients, so that some
+objectives dominate training while others make little progress. To
+diagnose this imbalance, they compared the gradient magnitudes
+contributed by different loss terms and proposed adaptively balancing
+each non-PDE term
+$cal(L)_i in { cal(L)_(upright(B C))\,cal(L)_(upright(I C))\,cal(L)_(upright(d a t a)) }$
+against the PDE term:
+$ hat(lambda)_i = frac(max #scale(x: 120%, y: 120%)[\|] nabla_theta cal(L)_(upright(P D E)) #scale(x: 120%, y: 120%)[\|], "mean" #scale(x: 120%, y: 120%)[\|] nabla_theta cal(L)_i #scale(x: 120%, y: 120%)[\|])\,#h(2em) lambda_i arrow.l\(1 - alpha\)lambda_i + alpha hat(lambda)_i\, $<eq:grad_pathology>
+where $alpha in\(0\,1\)$ is a smoothing factor. This observation reveals
+that PINN training can fail even when each individual loss term is well
+defined, because the composite objective may provide poorly balanced
+optimization signals.
+
+A complementary perspective comes from the neural tangent kernel (NTK)
+analysis. Wang et al.~@wang2022and showed that different components of
+the PINN objective can converge at substantially different rates during
+training. This suggests that the imbalance is not only a matter of
+manually chosen scalar weights or instantaneous gradient magnitudes, but
+is also tied to the spectrum of the training dynamics induced by the PDE
+operator and the neural parameterization. In other words, gradient
+imbalance is a local symptom of a broader convergence-rate mismatch
+among the physics and data constraints.
+
+=== Ill-conditioned residual optimization
+<ill-conditioned-residual-optimization>
+Krishnapriyan et al.~@krishnapriyan2021failuremodes further showed that
+failures on harder PDEs often arise not from limited expressivity, but
+from optimization difficulty and the brittleness of strong-form residual
+minimization. Their analysis can be viewed through objectives of the
+form
+$ min_theta #h(0em) cal(L)_u + lambda_r cal(L)_(upright(P D E))\, $<eq:failure_modes>
+where $cal(L)_u$ is shorthand for the supervision terms on the solution,
+including boundary, initial, and observed data terms. Their key
+observation is that simply increasing the PDE weight $lambda_r$ does not
+necessarily improve training: while a larger $lambda_r$ enforces physics
+more strongly, it can also make the optimization problem more
+ill-conditioned. Related loss-landscape analyses similarly show that
+differential operators in the residual term can produce poorly
+conditioned objectives, making PINN training sensitive to optimizer
+choice and hyperparameter settings~@rathore2024challenges.
+
+=== Representation and frequency bias
+<representation-and-frequency-bias>
+Another diagnosis concerns the representation bias of the neural network
+itself. Standard fully connected networks tend to learn smooth,
+low-frequency components more easily than high-frequency or multi-scale
+structures. Wang et al. ~@wang2021eigenvector connected this behavior to
+the eigenspectrum of the limiting NTK and showed that conventional PINNs
+can struggle when the target solution contains sharp spatial or temporal
+variations. Thus, even when the PDE residual is correctly specified, the
+neural parameterization and its optimization dynamics may bias training
+away from the physically relevant solution.
+
+=== Takeaway
+<takeaway>
+Together, these diagnoses show that naive composite PINN losses can fail
+for several intertwined reasons: different loss terms may generate
+imbalanced or conflicting gradients, the residual objective may be
+ill-conditioned, and the neural parameterization may favor smooth
+low-frequency solutions over the multi-scale structures required by the
+PDE. These observations motivate the remedy strategies discussed next,
+which aim to rebalance, resample, schedule, or better optimize the
+physics-informed objective.
+
+== Diagnosis: why naive composite losses fail
+<diagnosis-why-naive-composite-losses-fail>
+Subsequent work showed that the challenge of data/loss-embedded physics
+lies not only in formulating the objective, but also in optimizing it
+reliably. For the composite PINN loss in Eq.~@eq:pinn_loss_compact, Wang
+et al.~@wang2021gradientpathologies showed that the PDE, boundary,
+initial, and data terms can induce highly imbalanced gradients, so that
+some objectives dominate training while others make little progress. To
+diagnose and mitigate this issue, they proposed adaptively balancing
+each non-PDE term
+$cal(L)_i in { cal(L)_(upright(B C))\,cal(L)_(upright(I C))\,cal(L)_(upright(d a t a)) }$
+against the PDE term:
+$ hat(lambda)_i = frac(max #scale(x: 120%, y: 120%)[\|] nabla_theta cal(L)_(upright(P D E)) #scale(x: 120%, y: 120%)[\|], #scale(x: 120%, y: 120%)[\|] nabla_theta cal(L)_i #scale(x: 120%, y: 120%)[\|])\,#h(2em) lambda_i arrow.l\(1 - alpha\)lambda_i + alpha hat(lambda)_i\, $<eq:grad_pathology>
+where $alpha in\(0\,1\)$ is a smoothing factor. Krishnapriyan et
+al.~@krishnapriyan2021failuremodes further showed that failures on
+harder PDEs often arise not from limited expressivity, but from
+optimization difficulty and the brittleness of strong-form residual
+minimization itself. Their analysis can be viewed through objectives of
+the form
+$ min_theta #h(0em) cal(L)_u + lambda_r cal(L)_(upright(P D E))\, $<eq:failure_modes>
+where $cal(L)_u$ is shorthand for the supervision terms on the solution,
+including boundary, initial, and observed data terms. Their key
+observation is that simply increasing the PDE weight $lambda_r$ does not
+necessarily improve training: while a larger $lambda_r$ enforces physics
+more strongly, it can also make the optimization landscape more
+ill-conditioned. To make this more manageable, they explored curriculum
+regularization, which schematically replaces the target PDE loss by a
+sequence of progressively harder PDE losses,
+$ min_theta #h(0em) cal(L)_u + lambda_r cal(L)_(upright(P D E))^(\(s\))\,#h(2em) s = 1\,dots.h\,S\, $<eq:curriculum_pde>
+where $s$ indexes the curriculum stage and $S$ is the total number of
+stages. Intuitively, the curriculum does not change the overall
+formulation, but makes the PDE part of the objective easier to optimize
+in early stages.
+
+== Remedies
+<remedies>
+The above failure modes have motivated a broad family of remedies,
+summarized in Table~@tab:pinn_remedies. To connect these methods with
+the failure modes discussed above, we organize them according to which
+part of the PINN pipeline they modify: loss balancing and optimization,
+residual sampling and curriculum design, neural representation and
+architecture, constraint enforcement, and domain decomposition. This
+taxonomy also highlights a useful distinction: some methods directly
+reshape the composite optimization objective, while others improve the
+sampling strategy, the neural trial space, the enforcement of physics
+constraints, or the scalability of the solver.
+
+=== Loss balancing and optimization
+<loss-balancing-and-optimization>
+#strong[Loss balancing.] A first class of remedies addresses the
+composite PINN objective itself. Because the PDE residual, boundary
+conditions, initial conditions, and data terms can have very different
+magnitudes and gradient scales, fixed loss weights may cause some
+objectives to dominate training while others make little progress.
+Gradient-flow analyses therefore proposed adaptive weighting rules based
+on the gradient statistics of different loss terms
+@wang2021gradientpathologies. Related NTK-based analyses further showed
+that different components of the PINN loss can converge at different
+rates, motivating dynamic weights that balance the training dynamics of
+multiple physics constraints @wang2022and. More recent loss-balancing
+methods such as ReLoBRaLo formulate this issue as a multi-objective
+balancing problem and adjust weights according to relative training
+progress @bischof2025multi.
+
+Self-Adaptive PINNs~@mcclenny2023self address the same general issue
+from a point-wise residual-weighting perspective. Instead of assigning a
+fixed penalty to each collocation point, they introduce trainable
+adaptive weights:
+$ cal(L)_(upright(P D E)) = sum_j w_j thin r_theta\(upright(bold(x))_j\,t_j\)^2\, $<eq:adaptive_weight_compact>
+where $w_j$ is the adaptive importance weight for the residual at
+collocation point $\(upright(bold(x))_j\,t_j\)$. The network parameters
+are optimized to minimize the loss, while the weights are encouraged to
+increase on hard points with large residuals. As a result, the method
+automatically allocates more optimization effort to regions where the
+PDE is most strongly violated.
+
+#strong[Optimization.] Beyond weighting, optimizer design is also
+central to PINN training. Recent loss-landscape studies show that PINN
+objectives can be highly ill-conditioned, partly because differential
+operators amplify certain directions in parameter
+space~@rathore2024challenges. This explains why second-order or
+quasi-second-order optimizers such as L-BFGS~@liu1989limited, NysNewton
+CG~@rathore2024challenges, and SOAP-style preconditioning
+@wanggradient@vyas2025soap can substantially improve training stability.
+Schematically, such methods precondition the gradient update as
+$ theta_(t + 1) approx theta_t - eta H^(- 1) g_t\, $<eq:preconditioned_update>
+where $theta_t$ denotes the model parameters, $g_t$ is the total
+gradient, $eta$ is the learning rate, and $H$ denotes a curvature matrix
+or its approximation. Intuitively, curvature-aware preconditioning
+rescales poorly conditioned directions and can implicitly reduce
+conflicts among the gradients induced by different loss terms. These
+methods correspond to the first block of Table~@tab:pinn_remedies, which
+focuses on improving how the composite PINN objective is weighted and
+optimized.
+
+=== Residual sampling and causal curricula
+<residual-sampling-and-causal-curricula>
+#strong[Residual sampling.] A second class of remedies changes the
+distribution and order of physics supervision. In standard PINNs,
+collocation points are often sampled uniformly from the spatio-temporal
+domain. However, uniform sampling can waste many residual points in
+regions that are already well learned, while undersampling difficult
+regions with large PDE violations. Residual-based adaptive refinement
+methods, including RAR, RAD, and RAR-D, therefore update the sampling
+distribution according to the current residual @wu2023comprehensive:
+$ p\(upright(bold(x))\,t\)prop phi.alt #h(-1em) (lr(|r_theta \( upright(bold(x)) \, t \)|))\, $<eq:rad_sampling>
+where $p\(upright(bold(x))\,t\)$ denotes the sampling density and
+$phi.alt\(dot.op\)$ is a monotone function of the residual magnitude.
+This shifts collocation points toward regions where the current PINN
+violates the governing equation most strongly. Region-optimized PINNs
+further refine this idea by optimizing the spatial allocation of
+residual points more explicitly @wu2024ropinn. In this sense, adaptive
+sampling improves where physics is enforced, rather than changing the
+PDE loss itself.
+
+#strong[Causality-aware sampling.] For time-dependent problems, another
+important issue is temporal ordering. If residuals from all time steps
+are optimized simultaneously, errors from early times can propagate
+forward and make long-time prediction difficult. Causality-aware
+training addresses this problem by decomposing the temporal domain into
+chunks and weighting later chunks according to the accuracy of earlier
+ones @wang2024respecting:
+$ cal(L)_(upright(P D E))\(theta\)= 1 / N_t sum_(i = 1)^(N_t) omega_i thin cal(L)_(upright(P D E))^(\(i\))\(theta\)\, $<eq:causal_loss>
+where $N_t$ is the number of temporal chunks,
+$cal(L)_(upright(P D E))^(\(i\))$ is the residual loss on the $i$-th
+time slab, and $omega_i$ is a causal weight. The weights are designed so
+that later times receive significant penalty only after earlier-time
+residuals have been sufficiently reduced. Curriculum-based methods such
+as CoPINN extend this idea by explicitly organizing training from easier
+to harder residual constraints @duan2025copinn. Together, the second
+block of Table~@tab:pinn_remedies summarizes methods that improve where,
+when, and in what order residual supervision is imposed.
+
+=== Representation and architecture
+<representation-and-architecture>
+A third class of remedies addresses PINN failures from the perspective
+of neural representation. Standard coordinate-based MLPs often suffer
+from spectral bias, which makes them learn low-frequency components more
+easily than high-frequency or multi-scale structures. This is
+problematic for PDEs with sharp gradients, oscillatory solutions,
+boundary layers, or multi-scale dynamics. Fourier feature embeddings
+directly target this limitation by reshaping the coordinate
+representation and mitigating eigenvector bias in multi-scale PDEs
+@wang2021eigenvector. Similarly, sinusoidal activations provide a neural
+representation better suited for high-frequency implicit functions
+@sitzmann2020implicit, while locally adaptive activation functions
+introduce learnable activation slopes to accelerate convergence
+@jagtap2020locally. These methods do not directly modify the physics
+loss, but they make the neural trial space better matched to the target
+solution.
+
+More recent architectural remedies redesign the PINN backbone itself.
+SPINN uses separable network structures to improve efficiency,
+particularly through more efficient forward-mode automatic
+differentiation @cho2023separable. PINNsformer instead introduces a
+Transformer-based architecture to model sequential dependencies in
+physics-informed learning @zhao2024pinnsformer. These methods correspond
+to the representation and architecture block of
+Table~@tab:pinn_remedies: they are most useful when the difficulty comes
+not only from loss imbalance or sampling, but also from a mismatch
+between a simple MLP and the structure of the PDE solution.
+
+=== Constraint enforcement
+<constraint-enforcement>
+A fourth class of remedies modifies how boundary, initial, and physical
+constraints are imposed. In standard PINNs, boundary and initial
+conditions are usually enforced as soft penalty terms in the loss. This
+introduces additional loss-balancing difficulty: if the penalty is too
+small, the constraints may be violated; if it is too large, the PDE
+residual may be under-optimized. Classical hard-constrained neural trial
+functions address this issue by constructing solutions that satisfy
+prescribed constraints by design @lagaris1998artificial. A typical form
+is
+$ u_theta\(upright(bold(x))\,t\)= g\(upright(bold(x))\,t\)+ d\(upright(bold(x))\,t\)N_theta\(upright(bold(x))\,t\)\, $<eq:hard_constraint_ansatz>
+where $g\(upright(bold(x))\,t\)$ satisfies the prescribed constraint,
+$d\(upright(bold(x))\,t\)$ vanishes on the constrained boundary, and
+$N_theta$ is the trainable neural network. Since the constraint is built
+into the solution form, the optimizer no longer needs to enforce it only
+through a soft penalty weight. Modern PINN libraries and formulations
+further implement such hard constraints using approximate distance
+functions and geometry-aware output transformations @lu2021deepxde.
+Recent work also studies soft and hard boundary constraints for specific
+PDE families such as advection--diffusion equations @li2024physical.
+
+Variational and weak-form PINNs provide another way to improve
+constraint and residual enforcement. Instead of directly minimizing the
+point-wise strong-form PDE residual, these methods enforce the governing
+equation against test functions in an integral form. hp-VPINNs combine
+this variational formulation with hp-refinement and domain
+decomposition, improving the connection between PINNs and classical
+finite-element or Galerkin methods @kharazmi2021hpvpinn. Thus, the
+constraint-enforcement block of Table~@tab:pinn_remedies captures two
+related strategies: satisfying constraints by construction and replacing
+strong-form residuals with weak-form or variational objectives.
+
+=== Domain decomposition and scalability
+<domain-decomposition-and-scalability>
+Finally, a fifth class of remedies improves PINNs by localizing the
+learning problem. Instead of fitting a single global network over the
+entire spatio-temporal domain, domain-decomposition methods divide the
+domain into subregions and train local networks coupled through
+interface, conservation, or partition-of-unity constraints. Conservative
+PINNs impose interface flux continuity for conservation laws
+@jagtap2020conservative, while XPINNs generalize this idea to flexible
+space-time domain decomposition for nonlinear PDEs @jagtap2020extended.
+FBPINNs further introduce overlapping subdomains and partition-of-unity
+weighting to make the decomposition more scalable and localized
+@moseley2023finite.
+
+Recent extensions improve the scalability and adaptivity of this
+decomposition view. Multilevel FBPINNs introduce hierarchical
+decompositions to improve global communication across subdomains
+@dolean2024multilevel, while AB-PINNs use residual-driven adaptive bases
+to dynamically allocate decomposition capacity @botvinick2025ab. These
+methods correspond to the final block of Table~@tab:pinn_remedies. They
+are especially useful for heterogeneous, multi-scale, or long-time
+problems where a single global PINN is difficult to optimize.
+
+=== Summary
+<summary-1>
+Overall, the remedies in Table~@tab:pinn_remedies show that PINN
+performance is determined not only by whether the correct physical
+equations are included in the objective, but also by whether the
+resulting learning problem is numerically trainable. Loss balancing and
+second-order optimization improve how competing objectives are
+minimized; adaptive sampling and causal curricula improve where and when
+residuals are enforced; representation and architectural methods improve
+what functions the network can express; hard constraints and weak forms
+improve how physics is encoded; and domain decomposition improves
+scalability to complex physical systems.
+
+#figure(
+[
+  #show table.cell: set text(size: 6pt)
+  #set table.hline(stroke: (dash: "solid", thickness: 0.5pt))
+
+  #table(
+  columns: (1fr, auto, auto, auto),
+  stroke: none,
+  align: left + horizon,
+  inset: 2pt,
+
+  table.header[*Type*][*Method*][*Venue / Year*][*Keyword-style Contribution*],
+
+  table.hline(),
+  table.cell(rowspan: 6)[*Loss balancing and optimization*],
+
+  [Gradient-flow weighting~],
+  [SISC 2021],
+  [Adaptive loss weighting based on gradient statistics across PDE, boundary, and data terms.],
+
+  [NTK-based weighting~],
+  [JCP 2022],
+  [Balances different physics constraints through neural tangent kernel training dynamics.],
+
+  [SA-PINNs~],
+  [JCP 2023],
+  [Learns adaptive residual weights to emphasize difficult collocation points.],
+
+  [Loss-landscape / NysNewton-CG~],
+  [ICML 2024],
+  [Studies PINN ill-conditioning and improves training with second-order optimization.],
+
+  [ReLoBRaLo~],
+  [CMAME 2025],
+  [Relative loss balancing with random lookback for multi-objective PINN training.],
+
+  [SOAP / gradient alignment~],
+  [NeurIPS 2025],
+  [Uses quasi-second-order preconditioning to improve gradient alignment in composite PINN objectives.],
+
+  table.hline(),
+  table.cell(rowspan: 4)[*Residual sampling and curriculum*],
+
+  [RAR / RAD / RAR-D~],
+  [CMAME 2023],
+  [Residual-based adaptive refinement and distribution-based collocation sampling.],
+
+  [RoPINN~],
+  [NeurIPS 2024],
+  [Region-optimized residual sampling for more efficient collocation point selection.],
+
+  [Causal PINN training~],
+  [CMAME 2024],
+  [Causality-aware temporal segmentation and residual reweighting for time-dependent PDEs.],
+
+  [CoPINN~],
+  [ICML 2025],
+  [Cognitive easy-to-hard curriculum training for progressively enforcing difficult residuals.],
+
+  table.hline(),
+  table.cell(rowspan: 5)[*Representation and architecture*],
+
+  [Fourier features / eigenvector bias~],
+  [CMAME 2021],
+  [Multi-scale coordinate embeddings to mitigate spectral and eigenvector bias.],
+
+  [Adaptive activation functions~],
+  [Proc. R. Soc. A 2020],
+  [Learnable activation slopes and slope-recovery terms for faster convergence.],
+
+  [SIREN~],
+  [NeurIPS 2020],
+  [Sinusoidal activations for representing high-frequency implicit functions.],
+
+  [SPINN~],
+  [NeurIPS 2023],
+  [Separable network structure for efficient forward-mode automatic differentiation.],
+
+  [PINNsformer~],
+  [ICLR 2024],
+  [Transformer-based architecture for modeling sequential dependencies in PINNs.],
+
+  table.hline(),
+  table.cell( rowspan: 4)[*Constraint enforcement*],
+
+  [Approximate distance functions~],
+  [SIAM Review 2021],
+  [Implements hard constraints using distance functions and geometry-aware output transformations.],
+
+  [hp-VPINN~],
+  [CMAME 2021],
+  [Variational weak-form PINNs with hp-refinement and domain decomposition.],
+
+  [Hard initial/boundary constraints~],
+  [CMA 2024],
+  [Enforces prescribed initial and boundary conditions through constrained solution forms.],
+
+  [cPINN~],
+  [CMAME 2020],
+  [Conservative domain decomposition with interface flux continuity for conservation laws.],
+
+  table.hline(),
+  table.cell( rowspan: 4)[*Domain decomposition and scalability*],
+
+  [XPINN~],
+  [CCP 2020],
+  [General space-time domain decomposition for heterogeneous PDE problems.],
+
+  [FBPINN~],
+  [ACOM 2023],
+  [Overlapping subdomains with partition-of-unity weighting for localized training.],
+
+  [Multilevel FBPINN~],
+  [CMAME 2024],
+  [Hierarchical domain decomposition for improved global communication and scalability.],
+
+  [AB-PINN~],
+  [arXiv 2025],
+  [Adaptive residual-driven decomposition for dynamically allocating subdomains.],
+)
+]
+) <tab:pinn_remedies>
+
+#bibliography("part-1.bib")