#set math.equation(numbering: "1")

= Writing Plan
<writing-plan>
\1. Introduction

\2. Theoretical Foundations of Physics in Visual Computing

\3. Physics-Informed and Physics-Embedded Neural Methods You can pick
one of the following topics

- Data and Loss-embedded physics: PDE residual losses, initial-value and
  boundary-value constraints, other soft constraints, etc. --Han

- Architecture embedded physics: hard-coded invariances, physically
  parameterized layers, analytic kernels inside networks, etc.-- David

- Operator embedded physics: differentiable renderers, wave propagation
  and light transport operators, neural and fourier neural operators,
  etc. --Andrea

- System embedded physics: hardware in the loop, ONNs, etc. --zhen

- Applications of PINNs --Ana

\4. Failure Modes and Misconceptions (Where do these methods work and do
not work)

\5. Open Problems and Future Directions

\6. Discussion and Conclusion

= Data and Loss-embedded Physics
<sec:data_loss_embedded_physics>
== Background
<background>
Physics-Informed Neural Networks (PINNs) combine traditional
physics-based simulation with deep learning. Classical methods such as
the Finite Element Method (FEM)~@courant1994variational and Finite
Volume Method (FVM)~@leveque2002finite@patankar2018numerical solve
physical equations by first dividing the physical domain into many small
computational cells, like covering the space with a fine grid. These
methods are accurate and reliable, but they can be expensive for complex
geometries, moving boundaries, high-dimensional problems, or repeated
simulations in design. Pure deep learning can be faster, but if it only
learns from data, it may violate basic physical laws such as
conservation of mass, momentum, or energy, making it unreliable when
data are limited or test cases differ from training examples.

PINNs address this by incorporating physical laws directly into neural
network training. Instead of only fitting observed data, the model is
also penalized when its predictions do not satisfy the governing
differential equations. As a result, PINNs can learn solutions that are
both data-efficient and physically meaningful. They also avoid the need
for a predefined computational grid: rather than solving only on a fixed
grid, PINNs learn a continuous function over space and time and use
automatic differentiation to check the physical equations at sampled
points. This makes them useful for irregular shapes, changing domains,
limited measurements, and inverse problems where hidden physical
parameters need to be estimated.

== Formulation: embedding physics into the objective
<formulation-embedding-physics-into-the-objective>
Data and loss-embedded physics is a foundational paradigm for
incorporating physical knowledge into neural computation by encoding
governing equations and physical constraints directly into the training
objective. In this setting, a neural network
$u_theta\(upright(bold(x))\,t\)$ approximates the unknown physical field
$u\(upright(bold(x))\,t\)$, where $theta$ denotes the learnable
parameters, $upright(bold(x)) in Omega$ denotes the spatial coordinate
in the domain $Omega$, and $t in\[0\,T\]$ denotes time.

Consider a general time-dependent nonlinear PDE of the form
$ partial_t u\(upright(bold(x))\,t\)+ cal(N)\[u\]\(upright(bold(x))\,t\)= 0\,quad upright(bold(x)) in Omega\,quad t in\[0\,T\]\, $<eq:generic_pde>
where $cal(N)\[dot.op\]$ denotes a possibly nonlinear spatial
differential operator. PINNs~@raissi2019pinn define a physics residual
by substituting the neural approximation $u_theta$ into the governing
equation:
$ r_theta\(upright(bold(x))\,t\):= partial_t u_theta\(upright(bold(x))\,t\)+ cal(N)\[u_theta\]\(upright(bold(x))\,t\). $<eq:pde_residual>
The governing equation is satisfied at a point $\(upright(bold(x))\,t\)$
when $r_theta\(upright(bold(x))\,t\)= 0$. Therefore, the PDE residual is
penalized over a set of collocation points
${\(upright(bold(x))_j\,t_j\)}_(j = 1)^(N_r)$:
$ cal(L)_(upright(P D E)) = 1 / N_r sum_(j = 1)^(N_r) ∥r_theta \( upright(bold(x))_j \, t_j \)∥_2^2 . $<eq:pde_loss>

The full training objective then combines the equation loss with
supervision on the solution itself:
$ cal(L) = underbrace(lambda_r cal(L)_(upright(P D E)), upright("equation / physics loss")) + #h(0em) underbrace((lambda_b cal(L)_(upright(B C)) + lambda_i cal(L)_(upright(I C)) + lambda_d cal(L)_(upright(d a t a))), upright("data / constraint loss")) . $<eq:pinn_loss_compact>
Here, $cal(L)_(upright(B C))$, $cal(L)_(upright(I C))$, and
$cal(L)_(upright(d a t a))$ measure violations of boundary conditions,
initial conditions, and observed data, respectively, while
$lambda_r\,lambda_b\,lambda_i\,lambda_d$ are scalar balancing weights.

== Alternative formulations
<alternative-formulations>
The original PINN formulation enforces the #emph[strong-form] PDE
residual $r_theta\(upright(bold(x))\,t\)$ toward zero #emph[pointwise].
Many subsequent variants reformulate this objective to improve
stability, reduce derivative requirements, or better match different
classes of physical systems. For notation, we write
$upright(bold(z)) =\(upright(bold(x))\,t\)$ and let
$cal(D) = Omega times\[0\,T\]$ denote the space-time domain. These
alternatives can be roughly grouped into the following categories.

=== Variational/energy formulation
<variationalenergy-formulation>
Some PDEs admit an energy or variational principle, where the true
solution is characterized as the minimizer of an integral functional,
such as the Dirichlet energy. For example, the Deep Ritz
method~@yu2018deepritz considers PDEs with variational formulations, in
which the solution satisfies
$ u^(*) = arg min_(u in cal(V)) cal(E)\(u\)\, $<eq:deep_ritz_variational>
where $cal(V)$ denotes the admissible function space and
$cal(E)\(dot.op\)$ denotes the corresponding energy functional. Instead
of optimizing directly over the infinite-dimensional space $cal(V)$,
Deep Ritz parameterizes the solution with a neural network $u_theta$ and
solves $ theta^(*) = arg min_theta cal(E)\(u_theta\). $<eq:deep_ritz_nn>

Similar to PINNs, energy-based methods still use a neural network to
approximate the solution field itself. However, the training signal
comes from minimizing a global energy functional $cal(E)$ rather than
penalizing pointwise PDE residuals. In practice, the integral in
$cal(E)$ can be estimated by Monte Carlo sampling over the physical
domain, making the objective compatible with standard stochastic
gradient optimization. Compared with residual-based PINNs, variational
formulations often require lower-order derivatives of the network output
and can more naturally preserve physical structures encoded by the
energy, such as force balance, symmetry, or conservation-related
constraints, without introducing separate penalty losses.

Recent work has extended this idea to neural operators. For example,
Variational PINO (VINO)~@eshaghi2025variational trains a neural operator
by minimizing the PDE energy, achieving strong performance without
labeled solution data.

=== Weak formulations
<weak-formulations>
Another route is to enforce the PDE in an #emph[integrated] or
#emph[weak] sense rather than pointwise. Instead of requiring the
strong-form residual to vanish at individual collocation points,
weak-form methods require the residual to vanish when tested against a
set of test functions. For a set of test functions ${ v_k }_(k = 1)^K$,
this can be written as
$ cal(R)_theta\(v_k\):= integral_(cal(D)) r_theta\(upright(bold(z))\)thin v_k\(upright(bold(z))\)thin d upright(bold(z)) approx 0\,#h(2em) k = 1\,dots.h\,K\, $<eq:weak_residual>
where $cal(R)_theta\(v_k\)$ denotes the weak residual associated with
the test function $v_k$. In practice, weak formulations often integrate
the PDE by parts, which transfers derivatives from the neural solution
$u_theta$ to the test functions. This reduces the derivative order
required from the neural network and can improve stability for irregular
or non-smooth solutions.

Variational Physics-Informed Neural Networks (VPINNs)~@kharazmi2019vpinn
optimize a loss over such weak residuals:
$ cal(L)_(upright(w e a k)) = 1 / K sum_(k = 1)^K lr(|cal(R)_theta \( v_k \)|)^2 . $<eq:vpinn_loss>
Relative to standard PINNs, the key difference is that the PDE is
enforced in an averaged integral sense rather than pointwise.

hp-VPINNs~@kharazmi2021hpvpinn retain the same weak-form principle, but
apply it locally over a partition of the domain
$cal(D) = union.big_(e = 1)^(N_(upright(s d))) cal(D)_e$. The
corresponding local weak-form loss can be written as
$ cal(L)_(upright(h p)) = frac(1, N_(upright(s d)) K) sum_(e = 1)^(N_(upright(s d))) sum_(k = 1)^K lr(|cal(R)_theta^(\(e\)) \( v_k^(\(e\)) \)|)^2\, $<eq:hpvpinn_loss>
where
$ cal(R)_theta^(\(e\))\(v_k^(\(e\))\):= integral_(cal(D)_e) r_theta\(upright(bold(z))\)thin v_k^(\(e\))\(upright(bold(z))\)thin d upright(bold(z)) . $<eq:local_weak_residual>
Here, $cal(D)_e$ denotes the $e$-th subdomain, $N_(upright(s d))$ is the
number of subdomains, and $v_k^(\(e\))$ is a local test function on
$cal(D)_e$. This local formulation makes refinement more flexible:
$h$-refinement subdivides the domain more finely, while $p$-refinement
increases the polynomial order of the local test space. As a result,
hp-VPINNs can better resolve multi-scale or spatially heterogeneous
solutions.

A related line of work studies the choice of test space and residual
norm. For example, Robust VPINNs~@rojas2024robust address the
sensitivity of classical VPINNs to the test basis by minimizing
residuals in a dual norm, leading to improved stability.

=== Adversarial/Minimax formulations
<adversarialminimax-formulations>
Weak formulations can also be cast as saddle-point problems. A
representative example is the Weak Adversarial Network
(WAN)~@zang2020weak. Instead of choosing a fixed set of test functions,
WAN parameterizes both the solution and the test function with neural
networks: $u_theta$ for the solution and $phi_eta$ for the test
function. The method then solves a minimax problem of the form
$ min_theta max_eta #h(0em) cal(J)\(theta\,eta\)\, $<eq:wan_minimax>
where $cal(J)\(theta\,eta\)$ measures the weak residual induced by the
test network $phi_eta$. Intuitively, the solution network $u_theta$
tries to minimize the residual, while the test network $phi_eta$ acts as
an adversary that searches for regions or directions where the current
solution still violates the PDE. Therefore, rather than enforcing the
residual against a fixed test basis, WAN adaptively learns test
functions that expose the remaining error.

This adversarial weak-form perspective is especially useful when
hand-designed test functions are insufficient or when the PDE is
high-dimensional or non-smooth.

=== Summary
<summary>
Together, these developments broaden the "formulation" stage of PINN
research. They demonstrate that one can teach a network to respect a PDE
either by driving a pointwise residual to zero (classical
PINN~@raissi2019pinn), by minimizing an energy integral (Deep
Ritz~@yu2018deepritz, VINO~@eshaghi2025variational), by enforcing
weighted integral constraints (VPINN~@kharazmi2019vpinn,
hp-VPINN~@kharazmi2021hpvpinn, WF-PINN~@wang2025wf, etc.), or even by
solving a minimax problem (WAN~@zang2020weak). Each alternative has its
own advantages: variational forms lower the required smoothness, weak
forms improve stability on irregular solutions, and projection or flux
methods enforce conservation exactly. The literature continues to evolve
these ideas, offering a rich toolkit for physics-informed learning
beyond the original PINN objective.

== Diagnosis: why naive composite losses fail in PINN
<diagnosis-why-naive-composite-losses-fail-in-pinn>
Subsequent work showed that the challenge of data/loss-embedded physics
lies not only in formulating the objective, but also in optimizing it
reliably. For naive composite PINN losses, failures can arise from
several intertwined sources: imbalanced gradients across loss terms,
uneven convergence dynamics, ill-conditioned residual optimization, and
representation bias in the neural network itself.

=== Loss imbalance and uneven convergence
<loss-imbalance-and-uneven-convergence>
For the composite PINN loss in Eq.~@eq:pinn_loss_compact, Wang et
al.~@wang2021gradientpathologies showed that the PDE, boundary, initial,
and data terms can induce highly imbalanced gradients, so that some
objectives dominate training while others make little progress. To
diagnose this imbalance, they compared the gradient magnitudes
contributed by different loss terms and proposed adaptively balancing
each non-PDE term
$cal(L)_i in { cal(L)_(upright(B C))\,cal(L)_(upright(I C))\,cal(L)_(upright(d a t a)) }$
against the PDE term:
$ hat(lambda)_i = frac(max #scale(x: 120%, y: 120%)[\|] nabla_theta cal(L)_(upright(P D E)) #scale(x: 120%, y: 120%)[\|], "mean" #scale(x: 120%, y: 120%)[\|] nabla_theta cal(L)_i #scale(x: 120%, y: 120%)[\|])\,#h(2em) lambda_i arrow.l\(1 - alpha\)lambda_i + alpha hat(lambda)_i\, $<eq:grad_pathology>
where $alpha in\(0\,1\)$ is a smoothing factor. This observation reveals
that PINN training can fail even when each individual loss term is well
defined, because the composite objective may provide poorly balanced
optimization signals.

A complementary perspective comes from the neural tangent kernel (NTK)
analysis. Wang et al.~@wang2022and showed that different components of
the PINN objective can converge at substantially different rates during
training. This suggests that the imbalance is not only a matter of
manually chosen scalar weights or instantaneous gradient magnitudes, but
is also tied to the spectrum of the training dynamics induced by the PDE
operator and the neural parameterization. In other words, gradient
imbalance is a local symptom of a broader convergence-rate mismatch
among the physics and data constraints.

=== Ill-conditioned residual optimization
<ill-conditioned-residual-optimization>
Krishnapriyan et al.~@krishnapriyan2021failuremodes further showed that
failures on harder PDEs often arise not from limited expressivity, but
from optimization difficulty and the brittleness of strong-form residual
minimization. Their analysis can be viewed through objectives of the
form
$ min_theta #h(0em) cal(L)_u + lambda_r cal(L)_(upright(P D E))\, $<eq:failure_modes>
where $cal(L)_u$ is shorthand for the supervision terms on the solution,
including boundary, initial, and observed data terms. Their key
observation is that simply increasing the PDE weight $lambda_r$ does not
necessarily improve training: while a larger $lambda_r$ enforces physics
more strongly, it can also make the optimization problem more
ill-conditioned. Related loss-landscape analyses similarly show that
differential operators in the residual term can produce poorly
conditioned objectives, making PINN training sensitive to optimizer
choice and hyperparameter settings~@rathore2024challenges.

=== Representation and frequency bias
<representation-and-frequency-bias>
Another diagnosis concerns the representation bias of the neural network
itself. Standard fully connected networks tend to learn smooth,
low-frequency components more easily than high-frequency or multi-scale
structures. Wang et al. ~@wang2021eigenvector connected this behavior to
the eigenspectrum of the limiting NTK and showed that conventional PINNs
can struggle when the target solution contains sharp spatial or temporal
variations. Thus, even when the PDE residual is correctly specified, the
neural parameterization and its optimization dynamics may bias training
away from the physically relevant solution.

=== Takeaway
<takeaway>
Together, these diagnoses show that naive composite PINN losses can fail
for several intertwined reasons: different loss terms may generate
imbalanced or conflicting gradients, the residual objective may be
ill-conditioned, and the neural parameterization may favor smooth
low-frequency solutions over the multi-scale structures required by the
PDE. These observations motivate the remedy strategies discussed next,
which aim to rebalance, resample, schedule, or better optimize the
physics-informed objective.

== Diagnosis: why naive composite losses fail
<diagnosis-why-naive-composite-losses-fail>
Subsequent work showed that the challenge of data/loss-embedded physics
lies not only in formulating the objective, but also in optimizing it
reliably. For the composite PINN loss in Eq.~@eq:pinn_loss_compact, Wang
et al.~@wang2021gradientpathologies showed that the PDE, boundary,
initial, and data terms can induce highly imbalanced gradients, so that
some objectives dominate training while others make little progress. To
diagnose and mitigate this issue, they proposed adaptively balancing
each non-PDE term
$cal(L)_i in { cal(L)_(upright(B C))\,cal(L)_(upright(I C))\,cal(L)_(upright(d a t a)) }$
against the PDE term:
$ hat(lambda)_i = frac(max #scale(x: 120%, y: 120%)[\|] nabla_theta cal(L)_(upright(P D E)) #scale(x: 120%, y: 120%)[\|], #scale(x: 120%, y: 120%)[\|] nabla_theta cal(L)_i #scale(x: 120%, y: 120%)[\|])\,#h(2em) lambda_i arrow.l\(1 - alpha\)lambda_i + alpha hat(lambda)_i\, $<eq:grad_pathology>
where $alpha in\(0\,1\)$ is a smoothing factor. Krishnapriyan et
al.~@krishnapriyan2021failuremodes further showed that failures on
harder PDEs often arise not from limited expressivity, but from
optimization difficulty and the brittleness of strong-form residual
minimization itself. Their analysis can be viewed through objectives of
the form
$ min_theta #h(0em) cal(L)_u + lambda_r cal(L)_(upright(P D E))\, $<eq:failure_modes>
where $cal(L)_u$ is shorthand for the supervision terms on the solution,
including boundary, initial, and observed data terms. Their key
observation is that simply increasing the PDE weight $lambda_r$ does not
necessarily improve training: while a larger $lambda_r$ enforces physics
more strongly, it can also make the optimization landscape more
ill-conditioned. To make this more manageable, they explored curriculum
regularization, which schematically replaces the target PDE loss by a
sequence of progressively harder PDE losses,
$ min_theta #h(0em) cal(L)_u + lambda_r cal(L)_(upright(P D E))^(\(s\))\,#h(2em) s = 1\,dots.h\,S\, $<eq:curriculum_pde>
where $s$ indexes the curriculum stage and $S$ is the total number of
stages. Intuitively, the curriculum does not change the overall
formulation, but makes the PDE part of the objective easier to optimize
in early stages.

== Remedies
<remedies>
The above failure modes have motivated a broad family of remedies,
summarized in Table~@tab:pinn_remedies. To connect these methods with
the failure modes discussed above, we organize them according to which
part of the PINN pipeline they modify: loss balancing and optimization,
residual sampling and curriculum design, neural representation and
architecture, constraint enforcement, and domain decomposition. This
taxonomy also highlights a useful distinction: some methods directly
reshape the composite optimization objective, while others improve the
sampling strategy, the neural trial space, the enforcement of physics
constraints, or the scalability of the solver.

=== Loss balancing and optimization
<loss-balancing-and-optimization>
#strong[Loss balancing.] A first class of remedies addresses the
composite PINN objective itself. Because the PDE residual, boundary
conditions, initial conditions, and data terms can have very different
magnitudes and gradient scales, fixed loss weights may cause some
objectives to dominate training while others make little progress.
Gradient-flow analyses therefore proposed adaptive weighting rules based
on the gradient statistics of different loss terms
@wang2021gradientpathologies. Related NTK-based analyses further showed
that different components of the PINN loss can converge at different
rates, motivating dynamic weights that balance the training dynamics of
multiple physics constraints @wang2022and. More recent loss-balancing
methods such as ReLoBRaLo formulate this issue as a multi-objective
balancing problem and adjust weights according to relative training
progress @bischof2025multi.

Self-Adaptive PINNs~@mcclenny2023self address the same general issue
from a point-wise residual-weighting perspective. Instead of assigning a
fixed penalty to each collocation point, they introduce trainable
adaptive weights:
$ cal(L)_(upright(P D E)) = sum_j w_j thin r_theta\(upright(bold(x))_j\,t_j\)^2\, $<eq:adaptive_weight_compact>
where $w_j$ is the adaptive importance weight for the residual at
collocation point $\(upright(bold(x))_j\,t_j\)$. The network parameters
are optimized to minimize the loss, while the weights are encouraged to
increase on hard points with large residuals. As a result, the method
automatically allocates more optimization effort to regions where the
PDE is most strongly violated.

#strong[Optimization.] Beyond weighting, optimizer design is also
central to PINN training. Recent loss-landscape studies show that PINN
objectives can be highly ill-conditioned, partly because differential
operators amplify certain directions in parameter
space~@rathore2024challenges. This explains why second-order or
quasi-second-order optimizers such as L-BFGS~@liu1989limited, NysNewton
CG~@rathore2024challenges, and SOAP-style preconditioning
@wanggradient@vyas2025soap can substantially improve training stability.
Schematically, such methods precondition the gradient update as
$ theta_(t + 1) approx theta_t - eta H^(- 1) g_t\, $<eq:preconditioned_update>
where $theta_t$ denotes the model parameters, $g_t$ is the total
gradient, $eta$ is the learning rate, and $H$ denotes a curvature matrix
or its approximation. Intuitively, curvature-aware preconditioning
rescales poorly conditioned directions and can implicitly reduce
conflicts among the gradients induced by different loss terms. These
methods correspond to the first block of Table~@tab:pinn_remedies, which
focuses on improving how the composite PINN objective is weighted and
optimized.

=== Residual sampling and causal curricula
<residual-sampling-and-causal-curricula>
#strong[Residual sampling.] A second class of remedies changes the
distribution and order of physics supervision. In standard PINNs,
collocation points are often sampled uniformly from the spatio-temporal
domain. However, uniform sampling can waste many residual points in
regions that are already well learned, while undersampling difficult
regions with large PDE violations. Residual-based adaptive refinement
methods, including RAR, RAD, and RAR-D, therefore update the sampling
distribution according to the current residual @wu2023comprehensive:
$ p\(upright(bold(x))\,t\)prop phi.alt #h(-1em) (lr(|r_theta \( upright(bold(x)) \, t \)|))\, $<eq:rad_sampling>
where $p\(upright(bold(x))\,t\)$ denotes the sampling density and
$phi.alt\(dot.op\)$ is a monotone function of the residual magnitude.
This shifts collocation points toward regions where the current PINN
violates the governing equation most strongly. Region-optimized PINNs
further refine this idea by optimizing the spatial allocation of
residual points more explicitly @wu2024ropinn. In this sense, adaptive
sampling improves where physics is enforced, rather than changing the
PDE loss itself.

#strong[Causality-aware sampling.] For time-dependent problems, another
important issue is temporal ordering. If residuals from all time steps
are optimized simultaneously, errors from early times can propagate
forward and make long-time prediction difficult. Causality-aware
training addresses this problem by decomposing the temporal domain into
chunks and weighting later chunks according to the accuracy of earlier
ones @wang2024respecting:
$ cal(L)_(upright(P D E))\(theta\)= 1 / N_t sum_(i = 1)^(N_t) omega_i thin cal(L)_(upright(P D E))^(\(i\))\(theta\)\, $<eq:causal_loss>
where $N_t$ is the number of temporal chunks,
$cal(L)_(upright(P D E))^(\(i\))$ is the residual loss on the $i$-th
time slab, and $omega_i$ is a causal weight. The weights are designed so
that later times receive significant penalty only after earlier-time
residuals have been sufficiently reduced. Curriculum-based methods such
as CoPINN extend this idea by explicitly organizing training from easier
to harder residual constraints @duan2025copinn. Together, the second
block of Table~@tab:pinn_remedies summarizes methods that improve where,
when, and in what order residual supervision is imposed.

=== Representation and architecture
<representation-and-architecture>
A third class of remedies addresses PINN failures from the perspective
of neural representation. Standard coordinate-based MLPs often suffer
from spectral bias, which makes them learn low-frequency components more
easily than high-frequency or multi-scale structures. This is
problematic for PDEs with sharp gradients, oscillatory solutions,
boundary layers, or multi-scale dynamics. Fourier feature embeddings
directly target this limitation by reshaping the coordinate
representation and mitigating eigenvector bias in multi-scale PDEs
@wang2021eigenvector. Similarly, sinusoidal activations provide a neural
representation better suited for high-frequency implicit functions
@sitzmann2020implicit, while locally adaptive activation functions
introduce learnable activation slopes to accelerate convergence
@jagtap2020locally. These methods do not directly modify the physics
loss, but they make the neural trial space better matched to the target
solution.

More recent architectural remedies redesign the PINN backbone itself.
SPINN uses separable network structures to improve efficiency,
particularly through more efficient forward-mode automatic
differentiation @cho2023separable. PINNsformer instead introduces a
Transformer-based architecture to model sequential dependencies in
physics-informed learning @zhao2024pinnsformer. These methods correspond
to the representation and architecture block of
Table~@tab:pinn_remedies: they are most useful when the difficulty comes
not only from loss imbalance or sampling, but also from a mismatch
between a simple MLP and the structure of the PDE solution.

=== Constraint enforcement
<constraint-enforcement>
A fourth class of remedies modifies how boundary, initial, and physical
constraints are imposed. In standard PINNs, boundary and initial
conditions are usually enforced as soft penalty terms in the loss. This
introduces additional loss-balancing difficulty: if the penalty is too
small, the constraints may be violated; if it is too large, the PDE
residual may be under-optimized. Classical hard-constrained neural trial
functions address this issue by constructing solutions that satisfy
prescribed constraints by design @lagaris1998artificial. A typical form
is
$ u_theta\(upright(bold(x))\,t\)= g\(upright(bold(x))\,t\)+ d\(upright(bold(x))\,t\)N_theta\(upright(bold(x))\,t\)\, $<eq:hard_constraint_ansatz>
where $g\(upright(bold(x))\,t\)$ satisfies the prescribed constraint,
$d\(upright(bold(x))\,t\)$ vanishes on the constrained boundary, and
$N_theta$ is the trainable neural network. Since the constraint is built
into the solution form, the optimizer no longer needs to enforce it only
through a soft penalty weight. Modern PINN libraries and formulations
further implement such hard constraints using approximate distance
functions and geometry-aware output transformations @lu2021deepxde.
Recent work also studies soft and hard boundary constraints for specific
PDE families such as advection--diffusion equations @li2024physical.

Variational and weak-form PINNs provide another way to improve
constraint and residual enforcement. Instead of directly minimizing the
point-wise strong-form PDE residual, these methods enforce the governing
equation against test functions in an integral form. hp-VPINNs combine
this variational formulation with hp-refinement and domain
decomposition, improving the connection between PINNs and classical
finite-element or Galerkin methods @kharazmi2021hpvpinn. Thus, the
constraint-enforcement block of Table~@tab:pinn_remedies captures two
related strategies: satisfying constraints by construction and replacing
strong-form residuals with weak-form or variational objectives.

=== Domain decomposition and scalability
<domain-decomposition-and-scalability>
Finally, a fifth class of remedies improves PINNs by localizing the
learning problem. Instead of fitting a single global network over the
entire spatio-temporal domain, domain-decomposition methods divide the
domain into subregions and train local networks coupled through
interface, conservation, or partition-of-unity constraints. Conservative
PINNs impose interface flux continuity for conservation laws
@jagtap2020conservative, while XPINNs generalize this idea to flexible
space-time domain decomposition for nonlinear PDEs @jagtap2020extended.
FBPINNs further introduce overlapping subdomains and partition-of-unity
weighting to make the decomposition more scalable and localized
@moseley2023finite.

Recent extensions improve the scalability and adaptivity of this
decomposition view. Multilevel FBPINNs introduce hierarchical
decompositions to improve global communication across subdomains
@dolean2024multilevel, while AB-PINNs use residual-driven adaptive bases
to dynamically allocate decomposition capacity @botvinick2025ab. These
methods correspond to the final block of Table~@tab:pinn_remedies. They
are especially useful for heterogeneous, multi-scale, or long-time
problems where a single global PINN is difficult to optimize.

=== Summary
<summary-1>
Overall, the remedies in Table~@tab:pinn_remedies show that PINN
performance is determined not only by whether the correct physical
equations are included in the objective, but also by whether the
resulting learning problem is numerically trainable. Loss balancing and
second-order optimization improve how competing objectives are
minimized; adaptive sampling and causal curricula improve where and when
residuals are enforced; representation and architectural methods improve
what functions the network can express; hard constraints and weak forms
improve how physics is encoded; and domain decomposition improves
scalability to complex physical systems.

#figure(
[
  #show table.cell: set text(size: 6pt)
  #set table.hline(stroke: (dash: "solid", thickness: 0.5pt))

  #table(
  columns: (1fr, auto, auto, auto),
  stroke: none,
  align: left + horizon,
  inset: 2pt,

  table.header[*Type*][*Method*][*Venue / Year*][*Keyword-style Contribution*],

  table.hline(),
  table.cell(rowspan: 6)[*Loss balancing and optimization*],

  [Gradient-flow weighting~],
  [SISC 2021],
  [Adaptive loss weighting based on gradient statistics across PDE, boundary, and data terms.],

  [NTK-based weighting~],
  [JCP 2022],
  [Balances different physics constraints through neural tangent kernel training dynamics.],

  [SA-PINNs~],
  [JCP 2023],
  [Learns adaptive residual weights to emphasize difficult collocation points.],

  [Loss-landscape / NysNewton-CG~],
  [ICML 2024],
  [Studies PINN ill-conditioning and improves training with second-order optimization.],

  [ReLoBRaLo~],
  [CMAME 2025],
  [Relative loss balancing with random lookback for multi-objective PINN training.],

  [SOAP / gradient alignment~],
  [NeurIPS 2025],
  [Uses quasi-second-order preconditioning to improve gradient alignment in composite PINN objectives.],

  table.hline(),
  table.cell(rowspan: 4)[*Residual sampling and curriculum*],

  [RAR / RAD / RAR-D~],
  [CMAME 2023],
  [Residual-based adaptive refinement and distribution-based collocation sampling.],

  [RoPINN~],
  [NeurIPS 2024],
  [Region-optimized residual sampling for more efficient collocation point selection.],

  [Causal PINN training~],
  [CMAME 2024],
  [Causality-aware temporal segmentation and residual reweighting for time-dependent PDEs.],

  [CoPINN~],
  [ICML 2025],
  [Cognitive easy-to-hard curriculum training for progressively enforcing difficult residuals.],

  table.hline(),
  table.cell(rowspan: 5)[*Representation and architecture*],

  [Fourier features / eigenvector bias~],
  [CMAME 2021],
  [Multi-scale coordinate embeddings to mitigate spectral and eigenvector bias.],

  [Adaptive activation functions~],
  [Proc. R. Soc. A 2020],
  [Learnable activation slopes and slope-recovery terms for faster convergence.],

  [SIREN~],
  [NeurIPS 2020],
  [Sinusoidal activations for representing high-frequency implicit functions.],

  [SPINN~],
  [NeurIPS 2023],
  [Separable network structure for efficient forward-mode automatic differentiation.],

  [PINNsformer~],
  [ICLR 2024],
  [Transformer-based architecture for modeling sequential dependencies in PINNs.],

  table.hline(),
  table.cell( rowspan: 4)[*Constraint enforcement*],

  [Approximate distance functions~],
  [SIAM Review 2021],
  [Implements hard constraints using distance functions and geometry-aware output transformations.],

  [hp-VPINN~],
  [CMAME 2021],
  [Variational weak-form PINNs with hp-refinement and domain decomposition.],

  [Hard initial/boundary constraints~],
  [CMA 2024],
  [Enforces prescribed initial and boundary conditions through constrained solution forms.],

  [cPINN~],
  [CMAME 2020],
  [Conservative domain decomposition with interface flux continuity for conservation laws.],

  table.hline(),
  table.cell( rowspan: 4)[*Domain decomposition and scalability*],

  [XPINN~],
  [CCP 2020],
  [General space-time domain decomposition for heterogeneous PDE problems.],

  [FBPINN~],
  [ACOM 2023],
  [Overlapping subdomains with partition-of-unity weighting for localized training.],

  [Multilevel FBPINN~],
  [CMAME 2024],
  [Hierarchical domain decomposition for improved global communication and scalability.],

  [AB-PINN~],
  [arXiv 2025],
  [Adaptive residual-driven decomposition for dynamically allocating subdomains.],
)
]
) <tab:pinn_remedies>

#bibliography("part-1.bib")