Files
COMP-790-175/part-1.tex
David Allemang 93bfee7eef Spring 2026
2026-05-25 11:34:56 -04:00

825 lines
46 KiB
TeX

% ---------------------------------------------------------------------------
% Author guideline and sample document for EG publication using LaTeX2e input
% D.Fellner, v1.13, Jul 31, 2008
\documentclass{egpubl-eurovis-star}
\usepackage{eurovis2014-star}
% --- for EuroVis
%\WsSubmission % uncomment for submission to EuroVis
\WsPaper % uncomment for final version of EuroVis contribution
\electronicVersion % can be used both for the printed and electronic version
% !! *please* don't change anything above
% !! unless you REALLY know what you are doing
% ------------------------------------------------------------------------
% for including postscript figures
% mind: package option 'draft' will replace PS figure by a filname within a frame
\ifpdf \usepackage[pdftex]{graphicx} \pdfcompresslevel=9
\else \usepackage[dvips]{graphicx} \fi
\PrintedOrElectronic
% prepare for electronic version of your document
\usepackage{t1enc,dfadobe}
\usepackage{egweblnk}
\usepackage{cite}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\usepackage{xspace}
\newcommand{\etal}{et al.\xspace}
\usepackage{amsfonts}
\usepackage{amsmath}
\usepackage{multirow}
\usepackage{booktabs}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% For backwards compatibility to old LaTeX type font selection.
% Uncomment if your document adheres to LaTeX2e recommendations.
% \let\rm=\rmfamily \let\sf=\sffamily \let\tt=\ttfamily
% \let\it=\itshape \let\sl=\slshape \let\sc=\scshape
% \let\bf=\bfseries
% end of prologue
%\input{EGauthorGuidelines-body.inc} % commented by KK for ShareLaTeX use
% ---------------------------------------------------------------------
% EG author guidelines plus sample file for EG publication using LaTeX2e input
% D.Fellner, v1.17, Sep 23, 2010
\title[EG \LaTeX\ Author Guidelines]%
{Physics-Informed and Physics-Embedded Neural Methods for Visual Computing}
% For anonymous conference submission, please enter your SUBMISSION ID.
\author[submission ID]{Please provide your submission ID here}
%% For the final version of your accepted paper, please enter the authors names and affiliations.
%\author[D. Fellner \& S. Behnke]
% {D.\,W. Fellner\thanks{Chairman Eurographics Publications Board}$^{1,2}$
% and S. Behnke$^{2}$
% \\
% $^1$TU Darmstadt \& Fraunhofer IGD, Germany\\
% $^2$Institut f{\"u}r ComputerGraphik \& Wissensvisualisierung, TU Graz, Austria
% }
% ------------------------------------------------------------------------
% if the Editors-in-Chief have given you the data, you may uncomment
% the following five lines and insert it here
%
% \volume{27} % the volume in which the issue will be published;
% \issue{1} % the issue number of the publication
% \pStartPage{1} % set starting page
%-------------------------------------------------------------------------
\begin{document}
% \teaser{
% \includegraphics[width=\linewidth]{eg_new}
% \centering
% \caption{New EG Logo}
% \label{fig:teaser}
% }
\maketitle
\begin{abstract}
The ABSTRACT is to be in fully-justified italicized text,
between two horizontal lines,
in one-column format,
below the author and affiliation information.
Use the word ``Abstract'' as the title, in 9-point Times, boldface type,
left-aligned to the text, initially capitalized.
The abstract is to be in 9-point, single-spaced type.
The abstract may be up to 3 inches (7.62 cm) long. \\
Leave one blank line after the abstract,
then add the subject categories according to the ACM Classification Index
(see http://www.acm.org/class/1998/).
\begin{classification} % according to http://www.acm.org/class/1998/
\CCScat{Computer Graphics}{I.3.3}{Picture/Image Generation}{Line and curve generation}
\end{classification}
\end{abstract}
%-------------------------------------------------------------------------
\section{Writing Plan}
1. Introduction
2. Theoretical Foundations of Physics in Visual Computing
3. Physics-Informed and Physics-Embedded Neural Methods
\textcolor{red}{You can pick one of the following topics}
\begin{itemize}
\item Data and Loss-embedded physics: PDE residual losses, initial-value and boundary-value constraints, other soft constraints, etc.\textcolor{red}{ --Han}
\item Architecture embedded physics: hard-coded invariances, physically parameterized layers, analytic kernels inside networks, etc.\textcolor{red}{-- David}
\item Operator embedded physics: differentiable renderers, wave propagation and light transport operators, neural and fourier neural operators, etc. \textcolor{red}{ --Andrea}
\item System embedded physics: hardware in the loop, ONNs, etc. \textcolor{red}{ --zhen}
\item Applications of PINNs\textcolor{red}{ --Ana}
\end{itemize}
4. Failure Modes and Misconceptions (Where do these methods work and do not work)
5. Open Problems and Future Directions
6. Discussion and Conclusion
\clearpage
\newpage
\newpage
%-------------------------------------------------------------------------
\section{Data and Loss-embedded Physics}
\label{sec:data_loss_embedded_physics}
\subsection{Background}
Physics-Informed Neural Networks (PINNs) combine traditional physics-based simulation with deep learning. Classical methods such as the Finite Element Method (FEM)~\cite{courant1994variational} and Finite Volume Method (FVM)~\cite{leveque2002finite,patankar2018numerical} solve physical equations by first dividing the physical domain into many small computational cells, like covering the space with a fine grid. These methods are accurate and reliable, but they can be expensive for complex geometries, moving boundaries, high-dimensional problems, or repeated simulations in design. Pure deep learning can be faster, but if it only learns from data, it may violate basic physical laws such as conservation of mass, momentum, or energy, making it unreliable when data are limited or test cases differ from training examples.
PINNs address this by incorporating physical laws directly into neural network training. Instead of only fitting observed data, the model is also penalized when its predictions do not satisfy the governing differential equations. As a result, PINNs can learn solutions that are both data-efficient and physically meaningful. They also avoid the need for a predefined computational grid: rather than solving only on a fixed grid, PINNs learn a continuous function over space and time and use automatic differentiation to check the physical equations at sampled points. This makes them useful for irregular shapes, changing domains, limited measurements, and inverse problems where hidden physical parameters need to be estimated.
\subsection{Formulation: embedding physics into the objective}
Data and loss-embedded physics is a foundational paradigm for incorporating physical knowledge into neural computation by encoding governing equations and physical constraints directly into the training objective. In this setting, a neural network $u_\theta(\mathbf{x},t)$ approximates the unknown physical field $u(\mathbf{x},t)$, where $\theta$ denotes the learnable parameters, $\mathbf{x}\in\Omega$ denotes the spatial coordinate in the domain $\Omega$, and $t\in[0,T]$ denotes time.
Consider a general time-dependent nonlinear PDE of the form
\begin{equation}
\partial_t u(\mathbf{x},t) + \mathcal{N}[u](\mathbf{x},t) = 0,
\quad \mathbf{x} \in \Omega,\quad t \in [0,T],
\label{eq:generic_pde}
\end{equation}
where $\mathcal{N}[\cdot]$ denotes a possibly nonlinear spatial differential operator. PINNs~\cite{raissi2019pinn} define a physics residual by substituting the neural approximation $u_\theta$ into the governing equation:
\begin{equation}
r_\theta(\mathbf{x},t)
:=
\partial_t u_\theta(\mathbf{x},t)
+
\mathcal{N}[u_\theta](\mathbf{x},t).
\label{eq:pde_residual}
\end{equation}
The governing equation is satisfied at a point $(\mathbf{x},t)$ when $r_\theta(\mathbf{x},t)=0$. Therefore, the PDE residual is penalized over a set of collocation points $\{(\mathbf{x}_j,t_j)\}_{j=1}^{N_r}$:
\begin{equation}
\mathcal{L}_{\mathrm{PDE}}
=
\frac{1}{N_r}
\sum_{j=1}^{N_r}
\left\|r_\theta(\mathbf{x}_j,t_j)\right\|_2^2.
\label{eq:pde_loss}
\end{equation}
The full training objective then combines the equation loss with supervision on the solution itself:
\begin{equation}
\mathcal{L}
=
\underbrace{\lambda_{r}\mathcal{L}_{\mathrm{PDE}}}_{\text{equation / physics loss}}
+\;
\underbrace{\left(
\lambda_{b}\mathcal{L}_{\mathrm{BC}}
+
\lambda_{i}\mathcal{L}_{\mathrm{IC}}
+
\lambda_{d}\mathcal{L}_{\mathrm{data}}
\right)}_{\text{data / constraint loss}}.
\label{eq:pinn_loss_compact}
\end{equation}
Here, $\mathcal{L}_{\mathrm{BC}}$, $\mathcal{L}_{\mathrm{IC}}$, and $\mathcal{L}_{\mathrm{data}}$ measure violations of boundary conditions, initial conditions, and observed data, respectively, while $\lambda_r,\lambda_b,\lambda_i,\lambda_d$ are scalar balancing weights.
\subsection{Alternative formulations}
The original PINN formulation enforces the \emph{strong-form} PDE residual $r_\theta(\mathbf{x},t)$ toward zero \emph{pointwise}. Many subsequent variants reformulate this objective to improve stability, reduce derivative requirements, or better match different classes of physical systems. For notation, we write $\mathbf{z}=(\mathbf{x},t)$ and let $\mathcal{D}=\Omega\times[0,T]$ denote the space-time domain. These alternatives can be roughly grouped into the following categories.
\subsubsection{Variational/energy formulation}
Some PDEs admit an energy or variational principle, where the true solution is
characterized as the minimizer of an integral functional, such as the Dirichlet
energy. For example, the Deep Ritz method~\cite{yu2018deepritz} considers PDEs
with variational formulations, in which the solution satisfies
\begin{equation}
u^{*}
=
\arg\min_{u\in\mathcal{V}}
\mathcal{E}(u),
\label{eq:deep_ritz_variational}
\end{equation}
where $\mathcal{V}$ denotes the admissible function space and $\mathcal{E}(\cdot)$
denotes the corresponding energy functional. Instead of optimizing directly over
the infinite-dimensional space $\mathcal{V}$, Deep Ritz parameterizes the solution
with a neural network $u_\theta$ and solves
\begin{equation}
\theta^{*}
=
\arg\min_{\theta}
\mathcal{E}(u_\theta).
\label{eq:deep_ritz_nn}
\end{equation}
Similar to PINNs, energy-based methods still use a neural network to approximate
the solution field itself. However, the training signal comes from minimizing a
global energy functional $\mathcal{E}$ rather than penalizing pointwise PDE
residuals. In practice, the integral in $\mathcal{E}$ can be estimated by Monte
Carlo sampling over the physical domain, making the objective compatible with
standard stochastic gradient optimization. Compared with residual-based PINNs,
variational formulations often require lower-order derivatives of the network
output and can more naturally preserve physical structures encoded by the energy,
such as force balance, symmetry, or conservation-related constraints, without
introducing separate penalty losses.
Recent work has extended this idea to neural operators. For example, Variational
PINO (VINO)~\cite{eshaghi2025variational} trains a neural operator by minimizing
the PDE energy, achieving strong performance without labeled solution data.
\subsubsection{Weak formulations}
Another route is to enforce the PDE in an \emph{integrated} or \emph{weak} sense
rather than pointwise. Instead of requiring the strong-form residual to vanish at
individual collocation points, weak-form methods require the residual to vanish
when tested against a set of test functions. For a set of test functions
$\{v_k\}_{k=1}^{K}$, this can be written as
\begin{equation}
\mathcal{R}_{\theta}(v_k)
:=
\int_{\mathcal{D}}
r_\theta(\mathbf{z})\,v_k(\mathbf{z})\,d\mathbf{z}
\approx 0,
\qquad k=1,\dots,K,
\label{eq:weak_residual}
\end{equation}
where $\mathcal{R}_{\theta}(v_k)$ denotes the weak residual associated with the
test function $v_k$. In practice, weak formulations often integrate the PDE by
parts, which transfers derivatives from the neural solution $u_\theta$ to the
test functions. This reduces the derivative order required from the neural network
and can improve stability for irregular or non-smooth solutions.
Variational Physics-Informed Neural Networks (VPINNs)~\cite{kharazmi2019vpinn}
optimize a loss over such weak residuals:
\begin{equation}
\mathcal{L}_{\mathrm{weak}}
=
\frac{1}{K}
\sum_{k=1}^{K}
\left|
\mathcal{R}_{\theta}(v_k)
\right|^2.
\label{eq:vpinn_loss}
\end{equation}
Relative to standard PINNs, the key difference is that the PDE is enforced in an
averaged integral sense rather than pointwise.
hp-VPINNs~\cite{kharazmi2021hpvpinn} retain the same weak-form principle, but
apply it locally over a partition of the domain
$\mathcal{D}=\bigcup_{e=1}^{N_{\mathrm{sd}}}\mathcal{D}_e$. The corresponding
local weak-form loss can be written as
\begin{equation}
\mathcal{L}_{\mathrm{hp}}
=
\frac{1}{N_{\mathrm{sd}}K}
\sum_{e=1}^{N_{\mathrm{sd}}}
\sum_{k=1}^{K}
\left|
\mathcal{R}_{\theta}^{(e)}(v_k^{(e)})
\right|^2,
\label{eq:hpvpinn_loss}
\end{equation}
where
\begin{equation}
\mathcal{R}_{\theta}^{(e)}(v_k^{(e)})
:=
\int_{\mathcal{D}_e}
r_\theta(\mathbf{z})\,v_k^{(e)}(\mathbf{z})\,d\mathbf{z}.
\label{eq:local_weak_residual}
\end{equation}
Here, $\mathcal{D}_e$ denotes the $e$-th subdomain, $N_{\mathrm{sd}}$ is the
number of subdomains, and $v_k^{(e)}$ is a local test function on $\mathcal{D}_e$.
This local formulation makes refinement more flexible: $h$-refinement subdivides
the domain more finely, while $p$-refinement increases the polynomial order of the
local test space. As a result, hp-VPINNs can better resolve multi-scale or
spatially heterogeneous solutions.
A related line of work studies the choice of test space and residual norm. For
example, Robust VPINNs~\cite{rojas2024robust} address the sensitivity of classical
VPINNs to the test basis by minimizing residuals in a dual norm, leading to
improved stability.
\subsubsection{Adversarial/Minimax formulations}
Weak formulations can also be cast as saddle-point problems. A representative
example is the Weak Adversarial Network (WAN)~\cite{zang2020weak}. Instead of
choosing a fixed set of test functions, WAN parameterizes both the solution and
the test function with neural networks: $u_\theta$ for the solution and
$\varphi_\eta$ for the test function. The method then solves a minimax problem of
the form
\begin{equation}
\min_{\theta}
\max_{\eta}
\;
\mathcal{J}(\theta,\eta),
\label{eq:wan_minimax}
\end{equation}
where $\mathcal{J}(\theta,\eta)$ measures the weak residual induced by the test
network $\varphi_\eta$. Intuitively, the solution network $u_\theta$ tries to
minimize the residual, while the test network $\varphi_\eta$ acts as an adversary
that searches for regions or directions where the current solution still violates
the PDE. Therefore, rather than enforcing the residual against a fixed test basis,
WAN adaptively learns test functions that expose the remaining error.
This adversarial weak-form perspective is especially useful when hand-designed
test functions are insufficient or when the PDE is high-dimensional or non-smooth.
% \subsubsection{Conservative/integral constraints}
% Some methods focus on enforcing physical conservation laws explicitly. Instead of
% only minimizing a local PDE residual, these methods impose integral constraints
% that encode global or local conservation. For example, MUSA-PINN~\cite{zhang2026musa}
% enforces mass or momentum conservation over control volumes by using flux-balance
% integrals derived from the divergence theorem. Such constraints can be written
% abstractly as
% \begin{equation}
% \mathcal{C}_m(u_\theta)
% =
% c_m,
% \qquad m=1,\dots,M,
% \label{eq:conservation_constraints}
% \end{equation}
% where $\mathcal{C}_m(\cdot)$ denotes a conserved physical quantity, such as total
% mass or energy, and $c_m$ is its prescribed value.
% Other approaches impose conservation through projection. For example,
% PINN-Proj~\cite{baez2024guaranteeing} projects the neural output onto a constraint
% manifold that satisfies the desired conservation laws:
% \begin{equation}
% \tilde{u}_\theta
% =
% \Pi_{\mathcal{C}}(u_\theta),
% \label{eq:pinn_projection}
% \end{equation}
% where $\Pi_{\mathcal{C}}$ denotes projection onto the physically admissible set
% $\mathcal{C}$. By construction, the projected solution $\tilde{u}_\theta$ satisfies
% the chosen conservation constraints, reducing the drift in conserved quantities
% that can occur when conservation is enforced only through soft penalties.
% These integral and projection-based constraints complement weak-form methods: weak
% forms enforce the PDE in an averaged sense, while conservative formulations ensure
% that selected physical invariants are respected more directly.
% Some methods focus on enforcing integral conservation laws explicitly. For instance, MUSA-PINN~\cite{zhang2026musa} imposes mass/momentum conservation over control volumes by enforcing flux-balance integrals via the divergence theorem. Other approaches project the solution onto physically conserved manifolds. Baez et al. propose PINN-Proj~\cite{baez2024guaranteeing}, which project the solution onto physically conserved manifolds, and guarantees exact conservation of chosen integrals (like total mass or energy) by projecting the neural output onto a subspace satisfying the conservation law. This ``hard constraint'' approach eliminates drift in conserved quantities, whereas standard PINNs only enforce them in expectation. Such integral constraints complement weak-form ideas by ensuring global physical laws are honored exactly.
\subsubsection{Summary}
Together, these developments broaden the ``formulation'' stage of PINN research. They demonstrate that one can teach a network to respect a PDE either by driving a pointwise residual to zero (classical PINN~\cite{raissi2019pinn}), by minimizing an energy integral (Deep Ritz~\cite{yu2018deepritz}, VINO~\cite{eshaghi2025variational}), by enforcing weighted integral constraints (VPINN~\cite{kharazmi2019vpinn}, hp-VPINN~\cite{kharazmi2021hpvpinn}, WF-PINN~\cite{wang2025wf}, etc.), or even by solving a minimax problem (WAN~\cite{zang2020weak}). Each alternative has its own advantages: variational forms lower the required smoothness, weak forms improve stability on irregular solutions, and projection or flux methods enforce conservation exactly. The literature continues to evolve these ideas, offering a rich toolkit for physics-informed learning beyond the original PINN objective.
\subsection{Diagnosis: why naive composite losses fail in PINN}
Subsequent work showed that the challenge of data/loss-embedded physics lies not
only in formulating the objective, but also in optimizing it reliably. For naive
composite PINN losses, failures can arise from several intertwined sources:
imbalanced gradients across loss terms, uneven convergence dynamics, ill-conditioned
residual optimization, and representation bias in the neural network itself.
\subsubsection{Loss imbalance and uneven convergence}
For the composite PINN loss in Eq.~\ref{eq:pinn_loss_compact}, Wang \etal{}~\cite{wang2021gradientpathologies} showed that the PDE, boundary, initial, and data terms can induce highly imbalanced gradients, so that some objectives
dominate training while others make little progress. To diagnose this imbalance, they compared the gradient magnitudes contributed by different loss terms and proposed adaptively balancing each non-PDE term $\mathcal{L}_i \in \{\mathcal{L}_{\mathrm{BC}}, \mathcal{L}_{\mathrm{IC}}, \mathcal{L}_{\mathrm{data}}\}$ against the PDE term:
\begin{equation}
\hat{\lambda}_i
=
\frac{
\max \bigl|\nabla_{\theta}\mathcal{L}_{\mathrm{PDE}}\bigr|
}{
\operatorname{mean}\bigl|\nabla_{\theta}\mathcal{L}_{i}\bigr|
},
\qquad
\lambda_i \leftarrow (1-\alpha)\lambda_i+\alpha \hat{\lambda}_i,
\label{eq:grad_pathology}
\end{equation}
where $\alpha\in(0,1)$ is a smoothing factor. This observation reveals that PINN training can fail even when each individual loss term is well defined, because the composite objective may provide poorly balanced optimization signals.
A complementary perspective comes from the neural tangent kernel (NTK) analysis. Wang \etal{}~\cite{wang2022and} showed that different components of the PINN objective can converge at substantially different rates during training. This suggests that the imbalance is not only a matter of manually chosen scalar weights or instantaneous gradient magnitudes, but is also tied to the spectrum of the training dynamics induced by the PDE operator and the neural parameterization. In other words, gradient imbalance is a local symptom of a broader convergence-rate mismatch among the physics and data constraints.
\subsubsection{Ill-conditioned residual optimization}
Krishnapriyan \etal{}~\cite{krishnapriyan2021failuremodes} further showed that
failures on harder PDEs often arise not from limited expressivity, but from
optimization difficulty and the brittleness of strong-form residual minimization.
Their analysis can be viewed through objectives of the form
\begin{equation}
\min_{\theta}\;
\mathcal{L}_{u}
+
\lambda_{r}\mathcal{L}_{\mathrm{PDE}},
\label{eq:failure_modes}
\end{equation}
where $\mathcal{L}_{u}$ is shorthand for the supervision terms on the solution,
including boundary, initial, and observed data terms. Their key observation is
that simply increasing the PDE weight $\lambda_r$ does not necessarily improve
training: while a larger $\lambda_r$ enforces physics more strongly, it can also
make the optimization problem more ill-conditioned. Related loss-landscape
analyses similarly show that differential operators in the residual term can
produce poorly conditioned objectives, making PINN training sensitive to optimizer
choice and hyperparameter settings~\cite{rathore2024challenges}.
\subsubsection{Representation and frequency bias}
Another diagnosis concerns the representation bias of the neural network itself.
Standard fully connected networks tend to learn smooth, low-frequency components
more easily than high-frequency or multi-scale structures. Wang \etal{}
~\cite{wang2021eigenvector} connected this behavior to the eigenspectrum of the
limiting NTK and showed that conventional PINNs can struggle when the target
solution contains sharp spatial or temporal variations. Thus, even when the PDE
residual is correctly specified, the neural parameterization and its optimization
dynamics may bias training away from the physically relevant solution.
\subsubsection{Takeaway}
Together, these diagnoses show that naive composite PINN losses can fail for
several intertwined reasons: different loss terms may generate imbalanced or
conflicting gradients, the residual objective may be ill-conditioned, and the
neural parameterization may favor smooth low-frequency solutions over the
multi-scale structures required by the PDE. These observations motivate the
remedy strategies discussed next, which aim to rebalance, resample, schedule, or
better optimize the physics-informed objective.
\subsection{Diagnosis: why naive composite losses fail}
Subsequent work showed that the challenge of data/loss-embedded physics lies not only in formulating the objective, but also in optimizing it reliably. For the composite PINN loss in Eq.~\ref{eq:pinn_loss_compact}, Wang \etal{}~\cite{wang2021gradientpathologies} showed that the PDE, boundary, initial, and data terms can induce highly imbalanced gradients, so that some objectives dominate training while others make little progress. To diagnose and mitigate this issue, they proposed adaptively balancing each non-PDE term $\mathcal{L}_i \in \{\mathcal{L}_{\mathrm{BC}}, \mathcal{L}_{\mathrm{IC}}, \mathcal{L}_{\mathrm{data}}\}$ against the PDE term:
\begin{equation}
\hat{\lambda}_i
=
\frac{
\max \bigl|\nabla_{\theta}\mathcal{L}_{\mathrm{PDE}}\bigr|
}{
\bigl|\nabla_{\theta}\mathcal{L}_{i}\bigr|
},
\qquad
\lambda_i \leftarrow (1-\alpha)\lambda_i+\alpha \hat{\lambda}_i,
\label{eq:grad_pathology}
\end{equation}
where $\alpha\in(0,1)$ is a smoothing factor. Krishnapriyan \etal{}~\cite{krishnapriyan2021failuremodes} further showed that failures on harder PDEs often arise not from limited expressivity, but from optimization difficulty and the brittleness of strong-form residual minimization itself. Their analysis can be viewed through objectives of the form
\begin{equation}
\min_{\theta}\;
\mathcal{L}_{u}
+
\lambda_{r}\mathcal{L}_{\mathrm{PDE}},
\label{eq:failure_modes}
\end{equation}
where $\mathcal{L}_{u}$ is shorthand for the supervision terms on the solution, including boundary, initial, and observed data terms. Their key observation is that simply increasing the PDE weight $\lambda_r$ does not necessarily improve training: while a larger $\lambda_r$ enforces physics more strongly, it can also make the optimization landscape more ill-conditioned. To make this more manageable, they explored curriculum regularization, which schematically replaces the target PDE loss by a sequence of progressively harder PDE losses,
\begin{equation}
\min_{\theta}\;
\mathcal{L}_{u}
+
\lambda_{r}\mathcal{L}_{\mathrm{PDE}}^{(s)},
\qquad s=1,\dots,S,
\label{eq:curriculum_pde}
\end{equation}
where $s$ indexes the curriculum stage and $S$ is the total number of stages. Intuitively, the curriculum does not change the overall formulation, but makes the PDE part of the objective easier to optimize in early stages.
\subsection{Remedies}
The above failure modes have motivated a broad family of remedies, summarized in Table~\ref{tab:pinn_remedies}. To connect these methods with the failure modes discussed above, we organize them according to which part of the PINN pipeline they modify: loss balancing and optimization, residual sampling and curriculum design, neural representation and architecture, constraint enforcement, and domain decomposition. This taxonomy also highlights a useful distinction: some methods directly reshape the composite optimization objective, while others improve the sampling strategy, the neural trial space, the enforcement of physics constraints, or the scalability of the solver.
\subsubsection{Loss balancing and optimization}
\textbf{Loss balancing.} A first class of remedies addresses the composite PINN objective itself. Because the PDE residual, boundary conditions, initial conditions, and data terms can have very different magnitudes and gradient scales, fixed loss weights may cause some objectives to dominate training while others make little progress. Gradient-flow analyses therefore proposed adaptive weighting rules based on the gradient statistics of different loss terms \cite{wang2021gradientpathologies}. Related NTK-based analyses further showed that different components of the PINN loss can converge at different rates, motivating dynamic weights that balance the training dynamics of multiple physics constraints \cite{wang2022and}. More recent loss-balancing methods such as ReLoBRaLo formulate this issue as a multi-objective balancing problem and adjust weights according to relative training progress \cite{bischof2025multi}.
Self-Adaptive PINNs~\cite{mcclenny2023self} address the same general issue from a point-wise residual-weighting perspective. Instead of assigning a fixed penalty to each collocation point, they introduce trainable adaptive weights:
\begin{equation}
\mathcal{L}_{\mathrm{PDE}}
=
\sum_j w_j\, r_\theta(\mathbf{x}_j,t_j)^2,
\label{eq:adaptive_weight_compact}
\end{equation}
where $w_j$ is the adaptive importance weight for the residual at collocation point $(\mathbf{x}_j,t_j)$. The network parameters are optimized to minimize the loss, while the weights are encouraged to increase on hard points with large residuals. As a result, the method automatically allocates more optimization effort to regions where the PDE is most strongly violated.
\noindent\textbf{Optimization.} Beyond weighting, optimizer design is also central to PINN training. Recent loss-landscape studies show that PINN objectives can be highly ill-conditioned, partly because differential operators amplify certain directions in parameter space~\cite{rathore2024challenges}. This explains why second-order or quasi-second-order optimizers such as L-BFGS~\cite{liu1989limited}, NysNewton CG~\cite{rathore2024challenges}, and SOAP-style preconditioning \cite{wanggradient,vyas2025soap} can substantially improve training stability. Schematically, such methods precondition the gradient update as
\begin{equation}
\theta_{t+1}
\approx
\theta_t - \eta H^{-1} g_t,
\label{eq:preconditioned_update}
\end{equation}
where $\theta_t$ denotes the model parameters, $g_t$ is the total gradient, $\eta$ is the learning rate, and $H$ denotes a curvature matrix or its approximation. Intuitively, curvature-aware preconditioning rescales poorly conditioned directions and can implicitly reduce conflicts among the gradients induced by different loss terms. These methods correspond to the first block of Table~\ref{tab:pinn_remedies}, which focuses on improving how the composite PINN objective is weighted and optimized.
\subsubsection{Residual sampling and causal curricula}
\noindent\textbf{Residual sampling.} A second class of remedies changes the distribution and order of physics supervision. In standard PINNs, collocation points are often sampled uniformly from the spatio-temporal domain. However, uniform sampling can waste many residual points in regions that are already well learned, while undersampling difficult regions with large PDE violations. Residual-based adaptive refinement methods, including RAR, RAD, and RAR-D, therefore update the sampling distribution according to the current residual \cite{wu2023comprehensive}:
\begin{equation}
p(\mathbf{x},t)
\propto
\phi\!\left(\left|r_\theta(\mathbf{x},t)\right|\right),
\label{eq:rad_sampling}
\end{equation}
where $p(\mathbf{x},t)$ denotes the sampling density and $\phi(\cdot)$ is a monotone function of the residual magnitude. This shifts collocation points toward regions where the current PINN violates the governing equation most strongly. Region-optimized PINNs further refine this idea by optimizing the spatial allocation of residual points more explicitly \cite{wu2024ropinn}. In this sense, adaptive sampling improves where physics is enforced, rather than changing the PDE loss itself.
\noindent\textbf{Causality-aware sampling.} For time-dependent problems, another important issue is temporal ordering. If residuals from all time steps are optimized simultaneously, errors from early times can propagate forward and make long-time prediction difficult. Causality-aware training addresses this problem by decomposing the temporal domain into chunks and weighting later chunks according to the accuracy of earlier ones \cite{wang2024respecting}:
\begin{equation}
\mathcal{L}_{\mathrm{PDE}}(\theta)
=
\frac{1}{N_t}
\sum_{i=1}^{N_t}
\omega_i\, \mathcal{L}_{\mathrm{PDE}}^{(i)}(\theta),
\label{eq:causal_loss}
\end{equation}
where $N_t$ is the number of temporal chunks, $\mathcal{L}_{\mathrm{PDE}}^{(i)}$ is the residual loss on the $i$-th time slab, and $\omega_i$ is a causal weight. The weights are designed so that later times receive significant penalty only after earlier-time residuals have been sufficiently reduced. Curriculum-based methods such as CoPINN extend this idea by explicitly organizing training from easier to harder residual constraints \cite{duan2025copinn}. Together, the second block of Table~\ref{tab:pinn_remedies} summarizes methods that improve where, when, and in what order residual supervision is imposed.
\subsubsection{Representation and architecture}
A third class of remedies addresses PINN failures from the perspective of neural representation. Standard coordinate-based MLPs often suffer from spectral bias, which makes them learn low-frequency components more easily than high-frequency or multi-scale structures. This is problematic for PDEs with sharp gradients, oscillatory solutions, boundary layers, or multi-scale dynamics. Fourier feature embeddings directly target this limitation by reshaping the coordinate representation and mitigating eigenvector bias in multi-scale PDEs \cite{wang2021eigenvector}. Similarly, sinusoidal activations provide a neural representation better suited for high-frequency implicit functions \cite{sitzmann2020implicit}, while locally adaptive activation functions introduce learnable activation slopes to accelerate convergence \cite{jagtap2020locally}. These methods do not directly modify the physics loss, but they make the neural trial space better matched to the target solution.
More recent architectural remedies redesign the PINN backbone itself. SPINN uses separable network structures to improve efficiency, particularly through more efficient forward-mode automatic differentiation \cite{cho2023separable}. PINNsformer instead introduces a Transformer-based architecture to model sequential dependencies in physics-informed learning \cite{zhao2024pinnsformer}. These methods correspond to the representation and architecture block of Table~\ref{tab:pinn_remedies}: they are most useful when the difficulty comes not only from loss imbalance or sampling, but also from a mismatch between a simple MLP and the structure of the PDE solution.
\subsubsection{Constraint enforcement}
A fourth class of remedies modifies how boundary, initial, and physical constraints are imposed. In standard PINNs, boundary and initial conditions are usually enforced as soft penalty terms in the loss. This introduces additional loss-balancing difficulty: if the penalty is too small, the constraints may be violated; if it is too large, the PDE residual may be under-optimized. Classical hard-constrained neural trial functions address this issue by constructing solutions that satisfy prescribed constraints by design \cite{lagaris1998artificial}. A typical form is
\begin{equation}
u_\theta(\mathbf{x},t)
=
g(\mathbf{x},t)
+
d(\mathbf{x},t) N_\theta(\mathbf{x},t),
\label{eq:hard_constraint_ansatz}
\end{equation}
where $g(\mathbf{x},t)$ satisfies the prescribed constraint, $d(\mathbf{x},t)$ vanishes on the constrained boundary, and $N_\theta$ is the trainable neural network. Since the constraint is built into the solution form, the optimizer no longer needs to enforce it only through a soft penalty weight. Modern PINN libraries and formulations further implement such hard constraints
using approximate distance functions and geometry-aware output transformations \cite{lu2021deepxde}. Recent work also studies soft and hard boundary constraints for specific PDE families such as advection--diffusion equations \cite{li2024physical}.
Variational and weak-form PINNs provide another way to improve constraint and residual enforcement. Instead of directly minimizing the point-wise strong-form PDE residual, these methods enforce the governing equation against test functions in an integral form. hp-VPINNs combine this variational formulation with hp-refinement and domain decomposition, improving the connection between PINNs and classical finite-element or Galerkin methods \cite{kharazmi2021hpvpinn}. Thus, the constraint-enforcement block of Table~\ref{tab:pinn_remedies} captures two related strategies: satisfying constraints by construction and replacing strong-form residuals with weak-form or variational objectives.
\subsubsection{Domain decomposition and scalability}
Finally, a fifth class of remedies improves PINNs by localizing the learning problem. Instead of fitting a single global network over the entire spatio-temporal domain, domain-decomposition methods divide the domain into subregions and train local networks coupled through interface, conservation, or partition-of-unity constraints. Conservative PINNs impose interface flux continuity for conservation laws \cite{jagtap2020conservative}, while XPINNs generalize this idea to flexible space-time domain decomposition for nonlinear PDEs \cite{jagtap2020extended}. FBPINNs further introduce overlapping subdomains and partition-of-unity weighting to make the decomposition more scalable and localized \cite{moseley2023finite}.
Recent extensions improve the scalability and adaptivity of this decomposition view. Multilevel FBPINNs introduce hierarchical decompositions to improve global communication across subdomains \cite{dolean2024multilevel}, while AB-PINNs use residual-driven adaptive bases to dynamically allocate decomposition capacity \cite{botvinick2025ab}. These methods correspond to the final block of Table~\ref{tab:pinn_remedies}. They are especially useful for heterogeneous, multi-scale, or long-time problems where a single global PINN is difficult to optimize.
\subsubsection{Summary}
Overall, the remedies in Table~\ref{tab:pinn_remedies} show that PINN performance is determined not only by whether the correct physical equations are included in the objective, but also by whether the resulting learning problem is numerically trainable. Loss balancing and second-order optimization improve how competing objectives are minimized; adaptive sampling and causal curricula improve where and when residuals are enforced; representation and architectural methods improve what functions the network can express; hard constraints and weak forms improve how physics is encoded; and domain decomposition improves scalability to complex physical systems.
\begin{table*}[t]
\centering
\small
\setlength{\tabcolsep}{6pt}
\renewcommand{\arraystretch}{1.08}
\caption{Representative remedies for PINN failure modes, organized by whether
they modify the loss and optimizer, residual supervision, neural representation,
constraint enforcement, or domain decomposition.}
\label{tab:pinn_remedies}
\resizebox{\textwidth}{!}{%
\begin{tabular}{p{0.11\textwidth} p{0.32\textwidth} p{0.15\textwidth} p{0.72\textwidth}}
\toprule
\textbf{Type} & \textbf{Method} & \textbf{Venue / Year} & \textbf{Keyword-style Contribution} \\
\midrule
\multirow{6}{=}{\centering Loss balancing and optimization}
& Gradient-flow weighting~\cite{wang2021gradientpathologies}
& SISC 2021
& Adaptive loss weighting based on gradient statistics across PDE, boundary, and data terms. \\
& NTK-based weighting~\cite{wang2022and}
& JCP 2022
& Balances different physics constraints through neural tangent kernel training dynamics. \\
& SA-PINNs~\cite{mcclenny2023self}
& JCP 2023
& Learns adaptive residual weights to emphasize difficult collocation points. \\
& Loss-landscape / NysNewton-CG~\cite{rathore2024challenges}
& ICML 2024
& Studies PINN ill-conditioning and improves training with second-order optimization. \\
& ReLoBRaLo~\cite{bischof2025multi}
& CMAME 2025
& Relative loss balancing with random lookback for multi-objective PINN training. \\
& SOAP / gradient alignment~\cite{wanggradient}
& NeurIPS 2025
& Uses quasi-second-order preconditioning to improve gradient alignment in composite PINN objectives. \\
\midrule
\multirow{4}{=}{\centering Residual sampling and curriculum}
& RAR / RAD / RAR-D~\cite{wu2023comprehensive}
& CMAME 2023
& Residual-based adaptive refinement and distribution-based collocation sampling. \\
& RoPINN~\cite{wu2024ropinn}
& NeurIPS 2024
& Region-optimized residual sampling for more efficient collocation point selection. \\
& Causal PINN training~\cite{wang2024respecting}
& CMAME 2024
& Causality-aware temporal segmentation and residual reweighting for time-dependent PDEs. \\
& CoPINN~\cite{duan2025copinn}
& ICML 2025
& Cognitive easy-to-hard curriculum training for progressively enforcing difficult residuals. \\
\midrule
\multirow{5}{=}{\centering Representation and architecture}
& Fourier features / eigenvector bias~\cite{wang2021eigenvector}
& CMAME 2021
& Multi-scale coordinate embeddings to mitigate spectral and eigenvector bias. \\
& Adaptive activation functions~\cite{jagtap2020locally}
& Proc. R. Soc. A 2020
& Learnable activation slopes and slope-recovery terms for faster convergence. \\
& SIREN~\cite{sitzmann2020implicit}
& NeurIPS 2020
& Sinusoidal activations for representing high-frequency implicit functions. \\
& SPINN~\cite{cho2023separable}
& NeurIPS 2023
& Separable network structure for efficient forward-mode automatic differentiation. \\
& PINNsformer~\cite{zhao2024pinnsformer}
& ICLR 2024
& Transformer-based architecture for modeling sequential dependencies in PINNs. \\
\midrule
\multirow{4}{=}{\centering Constraint enforcement}
% & Hard boundary ansatz~\cite{lagaris1998artificial}
% & IEEE TNN 1998
% & Constructs neural trial solutions that satisfy boundary conditions by design. \\
& Approximate distance functions~\cite{lu2021deepxde}
& SIAM Review 2021
& Implements hard constraints using distance functions and geometry-aware output transformations. \\
& hp-VPINN~\cite{kharazmi2021hpvpinn}
& CMAME 2021
& Variational weak-form PINNs with hp-refinement and domain decomposition. \\
& Hard initial/boundary constraints~\cite{li2024physical}
& CMA 2024
& Enforces prescribed initial and boundary conditions through constrained solution forms. \\
\midrule
\multirow{5}{=}{\centering Domain decomposition and scalability}
& cPINN~\cite{jagtap2020conservative}
& CMAME 2020
& Conservative domain decomposition with interface flux continuity for conservation laws. \\
& XPINN~\cite{jagtap2020extended}
& CCP 2020
& General space-time domain decomposition for heterogeneous PDE problems. \\
& FBPINN~\cite{moseley2023finite}
& ACOM 2023
& Overlapping subdomains with partition-of-unity weighting for localized training. \\
& Multilevel FBPINN~\cite{dolean2024multilevel}
& CMAME 2024
& Hierarchical domain decomposition for improved global communication and scalability. \\
& AB-PINN~\cite{botvinick2025ab}
& arXiv 2025
& Adaptive residual-driven decomposition for dynamically allocating subdomains. \\
\bottomrule
\end{tabular}%
}
\end{table*}
% \newpage
% \newpage
% \subsubsection{Remedies I: weighting and optimization}
% A first line of remedies aims to repair conflicts within the composite loss. Self-Adaptive PINNs~\cite{mcclenny2023sapinn} introduce trainable weights over collocation points and optimize them jointly with the network parameters:
% \begin{equation}
% \mathcal{L}_{\mathrm{PDE}}
% =
% \sum_j w_j\, r_\theta(\mathbf{x}_j,t_j)^2,
% \label{eq:adaptive_weight_compact}
% \end{equation}
% where $w_j$ is a trainable adaptive importance weight for the residual at collocation point $(\mathbf{x}_j,t_j)$. Unlike fixed reweighting schemes, SA-PINNs train the network parameters to minimize the loss while driving the weights to increase on hard points, effectively seeking a saddle point. As a result, the method is \emph{self-adaptive}: regions with persistently large residuals automatically receive larger penalties and attract more optimization effort.
% More recently, Gradient Alignment in PINNs~\cite{wang2025gradientalignment} argued that optimizer choice is itself central to PINN training. They showed that first-order methods often struggle with the composite objective because gradients from the PDE and data/constraint terms can point in conflicting directions. Their main insight is that (quasi) second-order optimizers are better suited to this setting because curvature-based preconditioning updates
% \begin{equation}
% w_{t+1}\approx w_t-\eta H^{-1}g_t,
% \label{eq:preconditioned_update}
% \end{equation}
% where $w_t$ denotes the model parameters at iteration $t$, $g_t$ is the total gradient, $\eta$ is the learning rate, and $H$ denotes the Hessian. Intuitively, such updates can implicitly align competing gradients through curvature information, making the composite PINN objective easier to optimize. In particular, they identified SOAP as a practical quasi-Newton method that consistently outperforms standard first-order training on challenging PINN benchmarks.
% \subsubsection{Remedies II: sampling and causality}
% A second remedy line reorganizes \emph{physics supervision} itself, namely where and when the PDE residual is enforced. Wu \etal{}~\cite{wu2023adaptivesampling} showed that the residual points used in $\mathcal{L}_{\mathrm{PDE}}$ are not merely implementation details, but a central part of the learning problem. Their key idea is to replace uniform residual sampling by residual-informed sampling, schematically
% \begin{equation}
% p(\mathbf{x},t)\propto \phi\!\left(\left|r_\theta(\mathbf{x},t)\right|\right),
% \label{eq:rad_sampling}
% \end{equation}
% where $p(\mathbf{x},t)$ denotes the sampling density of residual points and $\phi(\cdot)$ is a nonlinear function of the PDE residual. Intuitively, this shifts collocation points toward regions where the current PINN violates the equation most strongly. In this sense, adaptive sampling improves \emph{where} physics is enforced, rather than changing the loss itself.
% For time-dependent problems, causality-aware training~\cite{wang2024causality} argues that residual losses should also respect temporal order. Instead of penalizing all times uniformly, they reformulate the residual objective as a weighted sum over temporal chunks,
% \begin{equation}
% \mathcal{L}_{\mathrm{PDE}}(\theta)
% =
% \frac{1}{N_t}\sum_{i=1}^{N_t} \omega_i\, \mathcal{L}_{\mathrm{PDE}}^{(i)}(\theta),
% \label{eq:causal_loss}
% \end{equation}
% where $N_t$ is the number of temporal chunks, $\mathcal{L}_{\mathrm{PDE}}^{(i)}(\theta)$ is the residual loss associated with the $i$-th time slab, and $\omega_i$ is a temporal weight. The weights are designed so that later times receive large weight only after earlier-time residuals have been sufficiently reduced. Thus, this method improves \emph{when} physics is enforced: it keeps the same residual objective, but schedules it in a way that respects causal temporal evolution.
% \noindent\textbf{Summary.}
% Taken together, these works define a clear progression from \emph{formulation}, to \emph{diagnosis}, to \emph{remedy} in data and loss-embedded physics. The field began by asking how physical laws should enter the loss, then showed that naive composite objectives can be numerically brittle, and finally developed methods that improve how these objectives are balanced, sampled, and optimized. This progression highlights that successful physics-informed learning depends not only on writing the right equations into the loss, but also on making the resulting objective trainable in practice.
%-------------------------------------------------------------------------
%\bibliographystyle{eg-alpha}
\bibliographystyle{eg-alpha-doi}
\bibliography{egbibsample}
%-------------------------------------------------------------------------
\newpage
\end{document}