Spring 2026
This commit is contained in:
646
part-1.typ
Normal file
646
part-1.typ
Normal file
@@ -0,0 +1,646 @@
|
||||
#set math.equation(numbering: "1")
|
||||
|
||||
= Writing Plan
|
||||
<writing-plan>
|
||||
\1. Introduction
|
||||
|
||||
\2. Theoretical Foundations of Physics in Visual Computing
|
||||
|
||||
\3. Physics-Informed and Physics-Embedded Neural Methods You can pick
|
||||
one of the following topics
|
||||
|
||||
- Data and Loss-embedded physics: PDE residual losses, initial-value and
|
||||
boundary-value constraints, other soft constraints, etc. --Han
|
||||
|
||||
- Architecture embedded physics: hard-coded invariances, physically
|
||||
parameterized layers, analytic kernels inside networks, etc.-- David
|
||||
|
||||
- Operator embedded physics: differentiable renderers, wave propagation
|
||||
and light transport operators, neural and fourier neural operators,
|
||||
etc. --Andrea
|
||||
|
||||
- System embedded physics: hardware in the loop, ONNs, etc. --zhen
|
||||
|
||||
- Applications of PINNs --Ana
|
||||
|
||||
\4. Failure Modes and Misconceptions (Where do these methods work and do
|
||||
not work)
|
||||
|
||||
\5. Open Problems and Future Directions
|
||||
|
||||
\6. Discussion and Conclusion
|
||||
|
||||
= Data and Loss-embedded Physics
|
||||
<sec:data_loss_embedded_physics>
|
||||
== Background
|
||||
<background>
|
||||
Physics-Informed Neural Networks (PINNs) combine traditional
|
||||
physics-based simulation with deep learning. Classical methods such as
|
||||
the Finite Element Method (FEM)~@courant1994variational and Finite
|
||||
Volume Method (FVM)~@leveque2002finite@patankar2018numerical solve
|
||||
physical equations by first dividing the physical domain into many small
|
||||
computational cells, like covering the space with a fine grid. These
|
||||
methods are accurate and reliable, but they can be expensive for complex
|
||||
geometries, moving boundaries, high-dimensional problems, or repeated
|
||||
simulations in design. Pure deep learning can be faster, but if it only
|
||||
learns from data, it may violate basic physical laws such as
|
||||
conservation of mass, momentum, or energy, making it unreliable when
|
||||
data are limited or test cases differ from training examples.
|
||||
|
||||
PINNs address this by incorporating physical laws directly into neural
|
||||
network training. Instead of only fitting observed data, the model is
|
||||
also penalized when its predictions do not satisfy the governing
|
||||
differential equations. As a result, PINNs can learn solutions that are
|
||||
both data-efficient and physically meaningful. They also avoid the need
|
||||
for a predefined computational grid: rather than solving only on a fixed
|
||||
grid, PINNs learn a continuous function over space and time and use
|
||||
automatic differentiation to check the physical equations at sampled
|
||||
points. This makes them useful for irregular shapes, changing domains,
|
||||
limited measurements, and inverse problems where hidden physical
|
||||
parameters need to be estimated.
|
||||
|
||||
== Formulation: embedding physics into the objective
|
||||
<formulation-embedding-physics-into-the-objective>
|
||||
Data and loss-embedded physics is a foundational paradigm for
|
||||
incorporating physical knowledge into neural computation by encoding
|
||||
governing equations and physical constraints directly into the training
|
||||
objective. In this setting, a neural network
|
||||
$u_theta\(upright(bold(x))\,t\)$ approximates the unknown physical field
|
||||
$u\(upright(bold(x))\,t\)$, where $theta$ denotes the learnable
|
||||
parameters, $upright(bold(x)) in Omega$ denotes the spatial coordinate
|
||||
in the domain $Omega$, and $t in\[0\,T\]$ denotes time.
|
||||
|
||||
Consider a general time-dependent nonlinear PDE of the form
|
||||
$ partial_t u\(upright(bold(x))\,t\)+ cal(N)\[u\]\(upright(bold(x))\,t\)= 0\,quad upright(bold(x)) in Omega\,quad t in\[0\,T\]\, $<eq:generic_pde>
|
||||
where $cal(N)\[dot.op\]$ denotes a possibly nonlinear spatial
|
||||
differential operator. PINNs~@raissi2019pinn define a physics residual
|
||||
by substituting the neural approximation $u_theta$ into the governing
|
||||
equation:
|
||||
$ r_theta\(upright(bold(x))\,t\):= partial_t u_theta\(upright(bold(x))\,t\)+ cal(N)\[u_theta\]\(upright(bold(x))\,t\). $<eq:pde_residual>
|
||||
The governing equation is satisfied at a point $\(upright(bold(x))\,t\)$
|
||||
when $r_theta\(upright(bold(x))\,t\)= 0$. Therefore, the PDE residual is
|
||||
penalized over a set of collocation points
|
||||
${\(upright(bold(x))_j\,t_j\)}_(j = 1)^(N_r)$:
|
||||
$ cal(L)_(upright(P D E)) = 1 / N_r sum_(j = 1)^(N_r) ∥r_theta \( upright(bold(x))_j \, t_j \)∥_2^2 . $<eq:pde_loss>
|
||||
|
||||
The full training objective then combines the equation loss with
|
||||
supervision on the solution itself:
|
||||
$ cal(L) = underbrace(lambda_r cal(L)_(upright(P D E)), upright("equation / physics loss")) + #h(0em) underbrace((lambda_b cal(L)_(upright(B C)) + lambda_i cal(L)_(upright(I C)) + lambda_d cal(L)_(upright(d a t a))), upright("data / constraint loss")) . $<eq:pinn_loss_compact>
|
||||
Here, $cal(L)_(upright(B C))$, $cal(L)_(upright(I C))$, and
|
||||
$cal(L)_(upright(d a t a))$ measure violations of boundary conditions,
|
||||
initial conditions, and observed data, respectively, while
|
||||
$lambda_r\,lambda_b\,lambda_i\,lambda_d$ are scalar balancing weights.
|
||||
|
||||
== Alternative formulations
|
||||
<alternative-formulations>
|
||||
The original PINN formulation enforces the #emph[strong-form] PDE
|
||||
residual $r_theta\(upright(bold(x))\,t\)$ toward zero #emph[pointwise].
|
||||
Many subsequent variants reformulate this objective to improve
|
||||
stability, reduce derivative requirements, or better match different
|
||||
classes of physical systems. For notation, we write
|
||||
$upright(bold(z)) =\(upright(bold(x))\,t\)$ and let
|
||||
$cal(D) = Omega times\[0\,T\]$ denote the space-time domain. These
|
||||
alternatives can be roughly grouped into the following categories.
|
||||
|
||||
=== Variational/energy formulation
|
||||
<variationalenergy-formulation>
|
||||
Some PDEs admit an energy or variational principle, where the true
|
||||
solution is characterized as the minimizer of an integral functional,
|
||||
such as the Dirichlet energy. For example, the Deep Ritz
|
||||
method~@yu2018deepritz considers PDEs with variational formulations, in
|
||||
which the solution satisfies
|
||||
$ u^(*) = arg min_(u in cal(V)) cal(E)\(u\)\, $<eq:deep_ritz_variational>
|
||||
where $cal(V)$ denotes the admissible function space and
|
||||
$cal(E)\(dot.op\)$ denotes the corresponding energy functional. Instead
|
||||
of optimizing directly over the infinite-dimensional space $cal(V)$,
|
||||
Deep Ritz parameterizes the solution with a neural network $u_theta$ and
|
||||
solves $ theta^(*) = arg min_theta cal(E)\(u_theta\). $<eq:deep_ritz_nn>
|
||||
|
||||
Similar to PINNs, energy-based methods still use a neural network to
|
||||
approximate the solution field itself. However, the training signal
|
||||
comes from minimizing a global energy functional $cal(E)$ rather than
|
||||
penalizing pointwise PDE residuals. In practice, the integral in
|
||||
$cal(E)$ can be estimated by Monte Carlo sampling over the physical
|
||||
domain, making the objective compatible with standard stochastic
|
||||
gradient optimization. Compared with residual-based PINNs, variational
|
||||
formulations often require lower-order derivatives of the network output
|
||||
and can more naturally preserve physical structures encoded by the
|
||||
energy, such as force balance, symmetry, or conservation-related
|
||||
constraints, without introducing separate penalty losses.
|
||||
|
||||
Recent work has extended this idea to neural operators. For example,
|
||||
Variational PINO (VINO)~@eshaghi2025variational trains a neural operator
|
||||
by minimizing the PDE energy, achieving strong performance without
|
||||
labeled solution data.
|
||||
|
||||
=== Weak formulations
|
||||
<weak-formulations>
|
||||
Another route is to enforce the PDE in an #emph[integrated] or
|
||||
#emph[weak] sense rather than pointwise. Instead of requiring the
|
||||
strong-form residual to vanish at individual collocation points,
|
||||
weak-form methods require the residual to vanish when tested against a
|
||||
set of test functions. For a set of test functions ${ v_k }_(k = 1)^K$,
|
||||
this can be written as
|
||||
$ cal(R)_theta\(v_k\):= integral_(cal(D)) r_theta\(upright(bold(z))\)thin v_k\(upright(bold(z))\)thin d upright(bold(z)) approx 0\,#h(2em) k = 1\,dots.h\,K\, $<eq:weak_residual>
|
||||
where $cal(R)_theta\(v_k\)$ denotes the weak residual associated with
|
||||
the test function $v_k$. In practice, weak formulations often integrate
|
||||
the PDE by parts, which transfers derivatives from the neural solution
|
||||
$u_theta$ to the test functions. This reduces the derivative order
|
||||
required from the neural network and can improve stability for irregular
|
||||
or non-smooth solutions.
|
||||
|
||||
Variational Physics-Informed Neural Networks (VPINNs)~@kharazmi2019vpinn
|
||||
optimize a loss over such weak residuals:
|
||||
$ cal(L)_(upright(w e a k)) = 1 / K sum_(k = 1)^K lr(|cal(R)_theta \( v_k \)|)^2 . $<eq:vpinn_loss>
|
||||
Relative to standard PINNs, the key difference is that the PDE is
|
||||
enforced in an averaged integral sense rather than pointwise.
|
||||
|
||||
hp-VPINNs~@kharazmi2021hpvpinn retain the same weak-form principle, but
|
||||
apply it locally over a partition of the domain
|
||||
$cal(D) = union.big_(e = 1)^(N_(upright(s d))) cal(D)_e$. The
|
||||
corresponding local weak-form loss can be written as
|
||||
$ cal(L)_(upright(h p)) = frac(1, N_(upright(s d)) K) sum_(e = 1)^(N_(upright(s d))) sum_(k = 1)^K lr(|cal(R)_theta^(\(e\)) \( v_k^(\(e\)) \)|)^2\, $<eq:hpvpinn_loss>
|
||||
where
|
||||
$ cal(R)_theta^(\(e\))\(v_k^(\(e\))\):= integral_(cal(D)_e) r_theta\(upright(bold(z))\)thin v_k^(\(e\))\(upright(bold(z))\)thin d upright(bold(z)) . $<eq:local_weak_residual>
|
||||
Here, $cal(D)_e$ denotes the $e$-th subdomain, $N_(upright(s d))$ is the
|
||||
number of subdomains, and $v_k^(\(e\))$ is a local test function on
|
||||
$cal(D)_e$. This local formulation makes refinement more flexible:
|
||||
$h$-refinement subdivides the domain more finely, while $p$-refinement
|
||||
increases the polynomial order of the local test space. As a result,
|
||||
hp-VPINNs can better resolve multi-scale or spatially heterogeneous
|
||||
solutions.
|
||||
|
||||
A related line of work studies the choice of test space and residual
|
||||
norm. For example, Robust VPINNs~@rojas2024robust address the
|
||||
sensitivity of classical VPINNs to the test basis by minimizing
|
||||
residuals in a dual norm, leading to improved stability.
|
||||
|
||||
=== Adversarial/Minimax formulations
|
||||
<adversarialminimax-formulations>
|
||||
Weak formulations can also be cast as saddle-point problems. A
|
||||
representative example is the Weak Adversarial Network
|
||||
(WAN)~@zang2020weak. Instead of choosing a fixed set of test functions,
|
||||
WAN parameterizes both the solution and the test function with neural
|
||||
networks: $u_theta$ for the solution and $phi_eta$ for the test
|
||||
function. The method then solves a minimax problem of the form
|
||||
$ min_theta max_eta #h(0em) cal(J)\(theta\,eta\)\, $<eq:wan_minimax>
|
||||
where $cal(J)\(theta\,eta\)$ measures the weak residual induced by the
|
||||
test network $phi_eta$. Intuitively, the solution network $u_theta$
|
||||
tries to minimize the residual, while the test network $phi_eta$ acts as
|
||||
an adversary that searches for regions or directions where the current
|
||||
solution still violates the PDE. Therefore, rather than enforcing the
|
||||
residual against a fixed test basis, WAN adaptively learns test
|
||||
functions that expose the remaining error.
|
||||
|
||||
This adversarial weak-form perspective is especially useful when
|
||||
hand-designed test functions are insufficient or when the PDE is
|
||||
high-dimensional or non-smooth.
|
||||
|
||||
=== Summary
|
||||
<summary>
|
||||
Together, these developments broaden the "formulation" stage of PINN
|
||||
research. They demonstrate that one can teach a network to respect a PDE
|
||||
either by driving a pointwise residual to zero (classical
|
||||
PINN~@raissi2019pinn), by minimizing an energy integral (Deep
|
||||
Ritz~@yu2018deepritz, VINO~@eshaghi2025variational), by enforcing
|
||||
weighted integral constraints (VPINN~@kharazmi2019vpinn,
|
||||
hp-VPINN~@kharazmi2021hpvpinn, WF-PINN~@wang2025wf, etc.), or even by
|
||||
solving a minimax problem (WAN~@zang2020weak). Each alternative has its
|
||||
own advantages: variational forms lower the required smoothness, weak
|
||||
forms improve stability on irregular solutions, and projection or flux
|
||||
methods enforce conservation exactly. The literature continues to evolve
|
||||
these ideas, offering a rich toolkit for physics-informed learning
|
||||
beyond the original PINN objective.
|
||||
|
||||
== Diagnosis: why naive composite losses fail in PINN
|
||||
<diagnosis-why-naive-composite-losses-fail-in-pinn>
|
||||
Subsequent work showed that the challenge of data/loss-embedded physics
|
||||
lies not only in formulating the objective, but also in optimizing it
|
||||
reliably. For naive composite PINN losses, failures can arise from
|
||||
several intertwined sources: imbalanced gradients across loss terms,
|
||||
uneven convergence dynamics, ill-conditioned residual optimization, and
|
||||
representation bias in the neural network itself.
|
||||
|
||||
=== Loss imbalance and uneven convergence
|
||||
<loss-imbalance-and-uneven-convergence>
|
||||
For the composite PINN loss in Eq.~@eq:pinn_loss_compact, Wang et
|
||||
al.~@wang2021gradientpathologies showed that the PDE, boundary, initial,
|
||||
and data terms can induce highly imbalanced gradients, so that some
|
||||
objectives dominate training while others make little progress. To
|
||||
diagnose this imbalance, they compared the gradient magnitudes
|
||||
contributed by different loss terms and proposed adaptively balancing
|
||||
each non-PDE term
|
||||
$cal(L)_i in { cal(L)_(upright(B C))\,cal(L)_(upright(I C))\,cal(L)_(upright(d a t a)) }$
|
||||
against the PDE term:
|
||||
$ hat(lambda)_i = frac(max #scale(x: 120%, y: 120%)[\|] nabla_theta cal(L)_(upright(P D E)) #scale(x: 120%, y: 120%)[\|], "mean" #scale(x: 120%, y: 120%)[\|] nabla_theta cal(L)_i #scale(x: 120%, y: 120%)[\|])\,#h(2em) lambda_i arrow.l\(1 - alpha\)lambda_i + alpha hat(lambda)_i\, $<eq:grad_pathology>
|
||||
where $alpha in\(0\,1\)$ is a smoothing factor. This observation reveals
|
||||
that PINN training can fail even when each individual loss term is well
|
||||
defined, because the composite objective may provide poorly balanced
|
||||
optimization signals.
|
||||
|
||||
A complementary perspective comes from the neural tangent kernel (NTK)
|
||||
analysis. Wang et al.~@wang2022and showed that different components of
|
||||
the PINN objective can converge at substantially different rates during
|
||||
training. This suggests that the imbalance is not only a matter of
|
||||
manually chosen scalar weights or instantaneous gradient magnitudes, but
|
||||
is also tied to the spectrum of the training dynamics induced by the PDE
|
||||
operator and the neural parameterization. In other words, gradient
|
||||
imbalance is a local symptom of a broader convergence-rate mismatch
|
||||
among the physics and data constraints.
|
||||
|
||||
=== Ill-conditioned residual optimization
|
||||
<ill-conditioned-residual-optimization>
|
||||
Krishnapriyan et al.~@krishnapriyan2021failuremodes further showed that
|
||||
failures on harder PDEs often arise not from limited expressivity, but
|
||||
from optimization difficulty and the brittleness of strong-form residual
|
||||
minimization. Their analysis can be viewed through objectives of the
|
||||
form
|
||||
$ min_theta #h(0em) cal(L)_u + lambda_r cal(L)_(upright(P D E))\, $<eq:failure_modes>
|
||||
where $cal(L)_u$ is shorthand for the supervision terms on the solution,
|
||||
including boundary, initial, and observed data terms. Their key
|
||||
observation is that simply increasing the PDE weight $lambda_r$ does not
|
||||
necessarily improve training: while a larger $lambda_r$ enforces physics
|
||||
more strongly, it can also make the optimization problem more
|
||||
ill-conditioned. Related loss-landscape analyses similarly show that
|
||||
differential operators in the residual term can produce poorly
|
||||
conditioned objectives, making PINN training sensitive to optimizer
|
||||
choice and hyperparameter settings~@rathore2024challenges.
|
||||
|
||||
=== Representation and frequency bias
|
||||
<representation-and-frequency-bias>
|
||||
Another diagnosis concerns the representation bias of the neural network
|
||||
itself. Standard fully connected networks tend to learn smooth,
|
||||
low-frequency components more easily than high-frequency or multi-scale
|
||||
structures. Wang et al. ~@wang2021eigenvector connected this behavior to
|
||||
the eigenspectrum of the limiting NTK and showed that conventional PINNs
|
||||
can struggle when the target solution contains sharp spatial or temporal
|
||||
variations. Thus, even when the PDE residual is correctly specified, the
|
||||
neural parameterization and its optimization dynamics may bias training
|
||||
away from the physically relevant solution.
|
||||
|
||||
=== Takeaway
|
||||
<takeaway>
|
||||
Together, these diagnoses show that naive composite PINN losses can fail
|
||||
for several intertwined reasons: different loss terms may generate
|
||||
imbalanced or conflicting gradients, the residual objective may be
|
||||
ill-conditioned, and the neural parameterization may favor smooth
|
||||
low-frequency solutions over the multi-scale structures required by the
|
||||
PDE. These observations motivate the remedy strategies discussed next,
|
||||
which aim to rebalance, resample, schedule, or better optimize the
|
||||
physics-informed objective.
|
||||
|
||||
== Diagnosis: why naive composite losses fail
|
||||
<diagnosis-why-naive-composite-losses-fail>
|
||||
Subsequent work showed that the challenge of data/loss-embedded physics
|
||||
lies not only in formulating the objective, but also in optimizing it
|
||||
reliably. For the composite PINN loss in Eq.~@eq:pinn_loss_compact, Wang
|
||||
et al.~@wang2021gradientpathologies showed that the PDE, boundary,
|
||||
initial, and data terms can induce highly imbalanced gradients, so that
|
||||
some objectives dominate training while others make little progress. To
|
||||
diagnose and mitigate this issue, they proposed adaptively balancing
|
||||
each non-PDE term
|
||||
$cal(L)_i in { cal(L)_(upright(B C))\,cal(L)_(upright(I C))\,cal(L)_(upright(d a t a)) }$
|
||||
against the PDE term:
|
||||
$ hat(lambda)_i = frac(max #scale(x: 120%, y: 120%)[\|] nabla_theta cal(L)_(upright(P D E)) #scale(x: 120%, y: 120%)[\|], #scale(x: 120%, y: 120%)[\|] nabla_theta cal(L)_i #scale(x: 120%, y: 120%)[\|])\,#h(2em) lambda_i arrow.l\(1 - alpha\)lambda_i + alpha hat(lambda)_i\, $<eq:grad_pathology>
|
||||
where $alpha in\(0\,1\)$ is a smoothing factor. Krishnapriyan et
|
||||
al.~@krishnapriyan2021failuremodes further showed that failures on
|
||||
harder PDEs often arise not from limited expressivity, but from
|
||||
optimization difficulty and the brittleness of strong-form residual
|
||||
minimization itself. Their analysis can be viewed through objectives of
|
||||
the form
|
||||
$ min_theta #h(0em) cal(L)_u + lambda_r cal(L)_(upright(P D E))\, $<eq:failure_modes>
|
||||
where $cal(L)_u$ is shorthand for the supervision terms on the solution,
|
||||
including boundary, initial, and observed data terms. Their key
|
||||
observation is that simply increasing the PDE weight $lambda_r$ does not
|
||||
necessarily improve training: while a larger $lambda_r$ enforces physics
|
||||
more strongly, it can also make the optimization landscape more
|
||||
ill-conditioned. To make this more manageable, they explored curriculum
|
||||
regularization, which schematically replaces the target PDE loss by a
|
||||
sequence of progressively harder PDE losses,
|
||||
$ min_theta #h(0em) cal(L)_u + lambda_r cal(L)_(upright(P D E))^(\(s\))\,#h(2em) s = 1\,dots.h\,S\, $<eq:curriculum_pde>
|
||||
where $s$ indexes the curriculum stage and $S$ is the total number of
|
||||
stages. Intuitively, the curriculum does not change the overall
|
||||
formulation, but makes the PDE part of the objective easier to optimize
|
||||
in early stages.
|
||||
|
||||
== Remedies
|
||||
<remedies>
|
||||
The above failure modes have motivated a broad family of remedies,
|
||||
summarized in Table~@tab:pinn_remedies. To connect these methods with
|
||||
the failure modes discussed above, we organize them according to which
|
||||
part of the PINN pipeline they modify: loss balancing and optimization,
|
||||
residual sampling and curriculum design, neural representation and
|
||||
architecture, constraint enforcement, and domain decomposition. This
|
||||
taxonomy also highlights a useful distinction: some methods directly
|
||||
reshape the composite optimization objective, while others improve the
|
||||
sampling strategy, the neural trial space, the enforcement of physics
|
||||
constraints, or the scalability of the solver.
|
||||
|
||||
=== Loss balancing and optimization
|
||||
<loss-balancing-and-optimization>
|
||||
#strong[Loss balancing.] A first class of remedies addresses the
|
||||
composite PINN objective itself. Because the PDE residual, boundary
|
||||
conditions, initial conditions, and data terms can have very different
|
||||
magnitudes and gradient scales, fixed loss weights may cause some
|
||||
objectives to dominate training while others make little progress.
|
||||
Gradient-flow analyses therefore proposed adaptive weighting rules based
|
||||
on the gradient statistics of different loss terms
|
||||
@wang2021gradientpathologies. Related NTK-based analyses further showed
|
||||
that different components of the PINN loss can converge at different
|
||||
rates, motivating dynamic weights that balance the training dynamics of
|
||||
multiple physics constraints @wang2022and. More recent loss-balancing
|
||||
methods such as ReLoBRaLo formulate this issue as a multi-objective
|
||||
balancing problem and adjust weights according to relative training
|
||||
progress @bischof2025multi.
|
||||
|
||||
Self-Adaptive PINNs~@mcclenny2023self address the same general issue
|
||||
from a point-wise residual-weighting perspective. Instead of assigning a
|
||||
fixed penalty to each collocation point, they introduce trainable
|
||||
adaptive weights:
|
||||
$ cal(L)_(upright(P D E)) = sum_j w_j thin r_theta\(upright(bold(x))_j\,t_j\)^2\, $<eq:adaptive_weight_compact>
|
||||
where $w_j$ is the adaptive importance weight for the residual at
|
||||
collocation point $\(upright(bold(x))_j\,t_j\)$. The network parameters
|
||||
are optimized to minimize the loss, while the weights are encouraged to
|
||||
increase on hard points with large residuals. As a result, the method
|
||||
automatically allocates more optimization effort to regions where the
|
||||
PDE is most strongly violated.
|
||||
|
||||
#strong[Optimization.] Beyond weighting, optimizer design is also
|
||||
central to PINN training. Recent loss-landscape studies show that PINN
|
||||
objectives can be highly ill-conditioned, partly because differential
|
||||
operators amplify certain directions in parameter
|
||||
space~@rathore2024challenges. This explains why second-order or
|
||||
quasi-second-order optimizers such as L-BFGS~@liu1989limited, NysNewton
|
||||
CG~@rathore2024challenges, and SOAP-style preconditioning
|
||||
@wanggradient@vyas2025soap can substantially improve training stability.
|
||||
Schematically, such methods precondition the gradient update as
|
||||
$ theta_(t + 1) approx theta_t - eta H^(- 1) g_t\, $<eq:preconditioned_update>
|
||||
where $theta_t$ denotes the model parameters, $g_t$ is the total
|
||||
gradient, $eta$ is the learning rate, and $H$ denotes a curvature matrix
|
||||
or its approximation. Intuitively, curvature-aware preconditioning
|
||||
rescales poorly conditioned directions and can implicitly reduce
|
||||
conflicts among the gradients induced by different loss terms. These
|
||||
methods correspond to the first block of Table~@tab:pinn_remedies, which
|
||||
focuses on improving how the composite PINN objective is weighted and
|
||||
optimized.
|
||||
|
||||
=== Residual sampling and causal curricula
|
||||
<residual-sampling-and-causal-curricula>
|
||||
#strong[Residual sampling.] A second class of remedies changes the
|
||||
distribution and order of physics supervision. In standard PINNs,
|
||||
collocation points are often sampled uniformly from the spatio-temporal
|
||||
domain. However, uniform sampling can waste many residual points in
|
||||
regions that are already well learned, while undersampling difficult
|
||||
regions with large PDE violations. Residual-based adaptive refinement
|
||||
methods, including RAR, RAD, and RAR-D, therefore update the sampling
|
||||
distribution according to the current residual @wu2023comprehensive:
|
||||
$ p\(upright(bold(x))\,t\)prop phi.alt #h(-1em) (lr(|r_theta \( upright(bold(x)) \, t \)|))\, $<eq:rad_sampling>
|
||||
where $p\(upright(bold(x))\,t\)$ denotes the sampling density and
|
||||
$phi.alt\(dot.op\)$ is a monotone function of the residual magnitude.
|
||||
This shifts collocation points toward regions where the current PINN
|
||||
violates the governing equation most strongly. Region-optimized PINNs
|
||||
further refine this idea by optimizing the spatial allocation of
|
||||
residual points more explicitly @wu2024ropinn. In this sense, adaptive
|
||||
sampling improves where physics is enforced, rather than changing the
|
||||
PDE loss itself.
|
||||
|
||||
#strong[Causality-aware sampling.] For time-dependent problems, another
|
||||
important issue is temporal ordering. If residuals from all time steps
|
||||
are optimized simultaneously, errors from early times can propagate
|
||||
forward and make long-time prediction difficult. Causality-aware
|
||||
training addresses this problem by decomposing the temporal domain into
|
||||
chunks and weighting later chunks according to the accuracy of earlier
|
||||
ones @wang2024respecting:
|
||||
$ cal(L)_(upright(P D E))\(theta\)= 1 / N_t sum_(i = 1)^(N_t) omega_i thin cal(L)_(upright(P D E))^(\(i\))\(theta\)\, $<eq:causal_loss>
|
||||
where $N_t$ is the number of temporal chunks,
|
||||
$cal(L)_(upright(P D E))^(\(i\))$ is the residual loss on the $i$-th
|
||||
time slab, and $omega_i$ is a causal weight. The weights are designed so
|
||||
that later times receive significant penalty only after earlier-time
|
||||
residuals have been sufficiently reduced. Curriculum-based methods such
|
||||
as CoPINN extend this idea by explicitly organizing training from easier
|
||||
to harder residual constraints @duan2025copinn. Together, the second
|
||||
block of Table~@tab:pinn_remedies summarizes methods that improve where,
|
||||
when, and in what order residual supervision is imposed.
|
||||
|
||||
=== Representation and architecture
|
||||
<representation-and-architecture>
|
||||
A third class of remedies addresses PINN failures from the perspective
|
||||
of neural representation. Standard coordinate-based MLPs often suffer
|
||||
from spectral bias, which makes them learn low-frequency components more
|
||||
easily than high-frequency or multi-scale structures. This is
|
||||
problematic for PDEs with sharp gradients, oscillatory solutions,
|
||||
boundary layers, or multi-scale dynamics. Fourier feature embeddings
|
||||
directly target this limitation by reshaping the coordinate
|
||||
representation and mitigating eigenvector bias in multi-scale PDEs
|
||||
@wang2021eigenvector. Similarly, sinusoidal activations provide a neural
|
||||
representation better suited for high-frequency implicit functions
|
||||
@sitzmann2020implicit, while locally adaptive activation functions
|
||||
introduce learnable activation slopes to accelerate convergence
|
||||
@jagtap2020locally. These methods do not directly modify the physics
|
||||
loss, but they make the neural trial space better matched to the target
|
||||
solution.
|
||||
|
||||
More recent architectural remedies redesign the PINN backbone itself.
|
||||
SPINN uses separable network structures to improve efficiency,
|
||||
particularly through more efficient forward-mode automatic
|
||||
differentiation @cho2023separable. PINNsformer instead introduces a
|
||||
Transformer-based architecture to model sequential dependencies in
|
||||
physics-informed learning @zhao2024pinnsformer. These methods correspond
|
||||
to the representation and architecture block of
|
||||
Table~@tab:pinn_remedies: they are most useful when the difficulty comes
|
||||
not only from loss imbalance or sampling, but also from a mismatch
|
||||
between a simple MLP and the structure of the PDE solution.
|
||||
|
||||
=== Constraint enforcement
|
||||
<constraint-enforcement>
|
||||
A fourth class of remedies modifies how boundary, initial, and physical
|
||||
constraints are imposed. In standard PINNs, boundary and initial
|
||||
conditions are usually enforced as soft penalty terms in the loss. This
|
||||
introduces additional loss-balancing difficulty: if the penalty is too
|
||||
small, the constraints may be violated; if it is too large, the PDE
|
||||
residual may be under-optimized. Classical hard-constrained neural trial
|
||||
functions address this issue by constructing solutions that satisfy
|
||||
prescribed constraints by design @lagaris1998artificial. A typical form
|
||||
is
|
||||
$ u_theta\(upright(bold(x))\,t\)= g\(upright(bold(x))\,t\)+ d\(upright(bold(x))\,t\)N_theta\(upright(bold(x))\,t\)\, $<eq:hard_constraint_ansatz>
|
||||
where $g\(upright(bold(x))\,t\)$ satisfies the prescribed constraint,
|
||||
$d\(upright(bold(x))\,t\)$ vanishes on the constrained boundary, and
|
||||
$N_theta$ is the trainable neural network. Since the constraint is built
|
||||
into the solution form, the optimizer no longer needs to enforce it only
|
||||
through a soft penalty weight. Modern PINN libraries and formulations
|
||||
further implement such hard constraints using approximate distance
|
||||
functions and geometry-aware output transformations @lu2021deepxde.
|
||||
Recent work also studies soft and hard boundary constraints for specific
|
||||
PDE families such as advection--diffusion equations @li2024physical.
|
||||
|
||||
Variational and weak-form PINNs provide another way to improve
|
||||
constraint and residual enforcement. Instead of directly minimizing the
|
||||
point-wise strong-form PDE residual, these methods enforce the governing
|
||||
equation against test functions in an integral form. hp-VPINNs combine
|
||||
this variational formulation with hp-refinement and domain
|
||||
decomposition, improving the connection between PINNs and classical
|
||||
finite-element or Galerkin methods @kharazmi2021hpvpinn. Thus, the
|
||||
constraint-enforcement block of Table~@tab:pinn_remedies captures two
|
||||
related strategies: satisfying constraints by construction and replacing
|
||||
strong-form residuals with weak-form or variational objectives.
|
||||
|
||||
=== Domain decomposition and scalability
|
||||
<domain-decomposition-and-scalability>
|
||||
Finally, a fifth class of remedies improves PINNs by localizing the
|
||||
learning problem. Instead of fitting a single global network over the
|
||||
entire spatio-temporal domain, domain-decomposition methods divide the
|
||||
domain into subregions and train local networks coupled through
|
||||
interface, conservation, or partition-of-unity constraints. Conservative
|
||||
PINNs impose interface flux continuity for conservation laws
|
||||
@jagtap2020conservative, while XPINNs generalize this idea to flexible
|
||||
space-time domain decomposition for nonlinear PDEs @jagtap2020extended.
|
||||
FBPINNs further introduce overlapping subdomains and partition-of-unity
|
||||
weighting to make the decomposition more scalable and localized
|
||||
@moseley2023finite.
|
||||
|
||||
Recent extensions improve the scalability and adaptivity of this
|
||||
decomposition view. Multilevel FBPINNs introduce hierarchical
|
||||
decompositions to improve global communication across subdomains
|
||||
@dolean2024multilevel, while AB-PINNs use residual-driven adaptive bases
|
||||
to dynamically allocate decomposition capacity @botvinick2025ab. These
|
||||
methods correspond to the final block of Table~@tab:pinn_remedies. They
|
||||
are especially useful for heterogeneous, multi-scale, or long-time
|
||||
problems where a single global PINN is difficult to optimize.
|
||||
|
||||
=== Summary
|
||||
<summary-1>
|
||||
Overall, the remedies in Table~@tab:pinn_remedies show that PINN
|
||||
performance is determined not only by whether the correct physical
|
||||
equations are included in the objective, but also by whether the
|
||||
resulting learning problem is numerically trainable. Loss balancing and
|
||||
second-order optimization improve how competing objectives are
|
||||
minimized; adaptive sampling and causal curricula improve where and when
|
||||
residuals are enforced; representation and architectural methods improve
|
||||
what functions the network can express; hard constraints and weak forms
|
||||
improve how physics is encoded; and domain decomposition improves
|
||||
scalability to complex physical systems.
|
||||
|
||||
#figure(
|
||||
[
|
||||
#show table.cell: set text(size: 6pt)
|
||||
#set table.hline(stroke: (dash: "solid", thickness: 0.5pt))
|
||||
|
||||
#table(
|
||||
columns: (1fr, auto, auto, auto),
|
||||
stroke: none,
|
||||
align: left + horizon,
|
||||
inset: 2pt,
|
||||
|
||||
table.header[*Type*][*Method*][*Venue / Year*][*Keyword-style Contribution*],
|
||||
|
||||
table.hline(),
|
||||
table.cell(rowspan: 6)[*Loss balancing and optimization*],
|
||||
|
||||
[Gradient-flow weighting~],
|
||||
[SISC 2021],
|
||||
[Adaptive loss weighting based on gradient statistics across PDE, boundary, and data terms.],
|
||||
|
||||
[NTK-based weighting~],
|
||||
[JCP 2022],
|
||||
[Balances different physics constraints through neural tangent kernel training dynamics.],
|
||||
|
||||
[SA-PINNs~],
|
||||
[JCP 2023],
|
||||
[Learns adaptive residual weights to emphasize difficult collocation points.],
|
||||
|
||||
[Loss-landscape / NysNewton-CG~],
|
||||
[ICML 2024],
|
||||
[Studies PINN ill-conditioning and improves training with second-order optimization.],
|
||||
|
||||
[ReLoBRaLo~],
|
||||
[CMAME 2025],
|
||||
[Relative loss balancing with random lookback for multi-objective PINN training.],
|
||||
|
||||
[SOAP / gradient alignment~],
|
||||
[NeurIPS 2025],
|
||||
[Uses quasi-second-order preconditioning to improve gradient alignment in composite PINN objectives.],
|
||||
|
||||
table.hline(),
|
||||
table.cell(rowspan: 4)[*Residual sampling and curriculum*],
|
||||
|
||||
[RAR / RAD / RAR-D~],
|
||||
[CMAME 2023],
|
||||
[Residual-based adaptive refinement and distribution-based collocation sampling.],
|
||||
|
||||
[RoPINN~],
|
||||
[NeurIPS 2024],
|
||||
[Region-optimized residual sampling for more efficient collocation point selection.],
|
||||
|
||||
[Causal PINN training~],
|
||||
[CMAME 2024],
|
||||
[Causality-aware temporal segmentation and residual reweighting for time-dependent PDEs.],
|
||||
|
||||
[CoPINN~],
|
||||
[ICML 2025],
|
||||
[Cognitive easy-to-hard curriculum training for progressively enforcing difficult residuals.],
|
||||
|
||||
table.hline(),
|
||||
table.cell(rowspan: 5)[*Representation and architecture*],
|
||||
|
||||
[Fourier features / eigenvector bias~],
|
||||
[CMAME 2021],
|
||||
[Multi-scale coordinate embeddings to mitigate spectral and eigenvector bias.],
|
||||
|
||||
[Adaptive activation functions~],
|
||||
[Proc. R. Soc. A 2020],
|
||||
[Learnable activation slopes and slope-recovery terms for faster convergence.],
|
||||
|
||||
[SIREN~],
|
||||
[NeurIPS 2020],
|
||||
[Sinusoidal activations for representing high-frequency implicit functions.],
|
||||
|
||||
[SPINN~],
|
||||
[NeurIPS 2023],
|
||||
[Separable network structure for efficient forward-mode automatic differentiation.],
|
||||
|
||||
[PINNsformer~],
|
||||
[ICLR 2024],
|
||||
[Transformer-based architecture for modeling sequential dependencies in PINNs.],
|
||||
|
||||
table.hline(),
|
||||
table.cell( rowspan: 4)[*Constraint enforcement*],
|
||||
|
||||
[Approximate distance functions~],
|
||||
[SIAM Review 2021],
|
||||
[Implements hard constraints using distance functions and geometry-aware output transformations.],
|
||||
|
||||
[hp-VPINN~],
|
||||
[CMAME 2021],
|
||||
[Variational weak-form PINNs with hp-refinement and domain decomposition.],
|
||||
|
||||
[Hard initial/boundary constraints~],
|
||||
[CMA 2024],
|
||||
[Enforces prescribed initial and boundary conditions through constrained solution forms.],
|
||||
|
||||
[cPINN~],
|
||||
[CMAME 2020],
|
||||
[Conservative domain decomposition with interface flux continuity for conservation laws.],
|
||||
|
||||
table.hline(),
|
||||
table.cell( rowspan: 4)[*Domain decomposition and scalability*],
|
||||
|
||||
[XPINN~],
|
||||
[CCP 2020],
|
||||
[General space-time domain decomposition for heterogeneous PDE problems.],
|
||||
|
||||
[FBPINN~],
|
||||
[ACOM 2023],
|
||||
[Overlapping subdomains with partition-of-unity weighting for localized training.],
|
||||
|
||||
[Multilevel FBPINN~],
|
||||
[CMAME 2024],
|
||||
[Hierarchical domain decomposition for improved global communication and scalability.],
|
||||
|
||||
[AB-PINN~],
|
||||
[arXiv 2025],
|
||||
[Adaptive residual-driven decomposition for dynamically allocating subdomains.],
|
||||
)
|
||||
]
|
||||
) <tab:pinn_remedies>
|
||||
|
||||
#bibliography("part-1.bib")
|
||||
Reference in New Issue
Block a user