#set math.equation(numbering: "1") = Writing Plan \1. Introduction \2. Theoretical Foundations of Physics in Visual Computing \3. Physics-Informed and Physics-Embedded Neural Methods You can pick one of the following topics - Data and Loss-embedded physics: PDE residual losses, initial-value and boundary-value constraints, other soft constraints, etc. --Han - Architecture embedded physics: hard-coded invariances, physically parameterized layers, analytic kernels inside networks, etc.-- David - Operator embedded physics: differentiable renderers, wave propagation and light transport operators, neural and fourier neural operators, etc. --Andrea - System embedded physics: hardware in the loop, ONNs, etc. --zhen - Applications of PINNs --Ana \4. Failure Modes and Misconceptions (Where do these methods work and do not work) \5. Open Problems and Future Directions \6. Discussion and Conclusion = Data and Loss-embedded Physics == Background Physics-Informed Neural Networks (PINNs) combine traditional physics-based simulation with deep learning. Classical methods such as the Finite Element Method (FEM)~@courant1994variational and Finite Volume Method (FVM)~@leveque2002finite@patankar2018numerical solve physical equations by first dividing the physical domain into many small computational cells, like covering the space with a fine grid. These methods are accurate and reliable, but they can be expensive for complex geometries, moving boundaries, high-dimensional problems, or repeated simulations in design. Pure deep learning can be faster, but if it only learns from data, it may violate basic physical laws such as conservation of mass, momentum, or energy, making it unreliable when data are limited or test cases differ from training examples. PINNs address this by incorporating physical laws directly into neural network training. Instead of only fitting observed data, the model is also penalized when its predictions do not satisfy the governing differential equations. As a result, PINNs can learn solutions that are both data-efficient and physically meaningful. They also avoid the need for a predefined computational grid: rather than solving only on a fixed grid, PINNs learn a continuous function over space and time and use automatic differentiation to check the physical equations at sampled points. This makes them useful for irregular shapes, changing domains, limited measurements, and inverse problems where hidden physical parameters need to be estimated. == Formulation: embedding physics into the objective Data and loss-embedded physics is a foundational paradigm for incorporating physical knowledge into neural computation by encoding governing equations and physical constraints directly into the training objective. In this setting, a neural network $u_theta\(upright(bold(x))\,t\)$ approximates the unknown physical field $u\(upright(bold(x))\,t\)$, where $theta$ denotes the learnable parameters, $upright(bold(x)) in Omega$ denotes the spatial coordinate in the domain $Omega$, and $t in\[0\,T\]$ denotes time. Consider a general time-dependent nonlinear PDE of the form $ partial_t u\(upright(bold(x))\,t\)+ cal(N)\[u\]\(upright(bold(x))\,t\)= 0\,quad upright(bold(x)) in Omega\,quad t in\[0\,T\]\, $ where $cal(N)\[dot.op\]$ denotes a possibly nonlinear spatial differential operator. PINNs~@raissi2019pinn define a physics residual by substituting the neural approximation $u_theta$ into the governing equation: $ r_theta\(upright(bold(x))\,t\):= partial_t u_theta\(upright(bold(x))\,t\)+ cal(N)\[u_theta\]\(upright(bold(x))\,t\). $ The governing equation is satisfied at a point $\(upright(bold(x))\,t\)$ when $r_theta\(upright(bold(x))\,t\)= 0$. Therefore, the PDE residual is penalized over a set of collocation points ${\(upright(bold(x))_j\,t_j\)}_(j = 1)^(N_r)$: $ cal(L)_(upright(P D E)) = 1 / N_r sum_(j = 1)^(N_r) ∥r_theta \( upright(bold(x))_j \, t_j \)∥_2^2 . $ The full training objective then combines the equation loss with supervision on the solution itself: $ cal(L) = underbrace(lambda_r cal(L)_(upright(P D E)), upright("equation / physics loss")) + #h(0em) underbrace((lambda_b cal(L)_(upright(B C)) + lambda_i cal(L)_(upright(I C)) + lambda_d cal(L)_(upright(d a t a))), upright("data / constraint loss")) . $ Here, $cal(L)_(upright(B C))$, $cal(L)_(upright(I C))$, and $cal(L)_(upright(d a t a))$ measure violations of boundary conditions, initial conditions, and observed data, respectively, while $lambda_r\,lambda_b\,lambda_i\,lambda_d$ are scalar balancing weights. == Alternative formulations The original PINN formulation enforces the #emph[strong-form] PDE residual $r_theta\(upright(bold(x))\,t\)$ toward zero #emph[pointwise]. Many subsequent variants reformulate this objective to improve stability, reduce derivative requirements, or better match different classes of physical systems. For notation, we write $upright(bold(z)) =\(upright(bold(x))\,t\)$ and let $cal(D) = Omega times\[0\,T\]$ denote the space-time domain. These alternatives can be roughly grouped into the following categories. === Variational/energy formulation Some PDEs admit an energy or variational principle, where the true solution is characterized as the minimizer of an integral functional, such as the Dirichlet energy. For example, the Deep Ritz method~@yu2018deepritz considers PDEs with variational formulations, in which the solution satisfies $ u^(*) = arg min_(u in cal(V)) cal(E)\(u\)\, $ where $cal(V)$ denotes the admissible function space and $cal(E)\(dot.op\)$ denotes the corresponding energy functional. Instead of optimizing directly over the infinite-dimensional space $cal(V)$, Deep Ritz parameterizes the solution with a neural network $u_theta$ and solves $ theta^(*) = arg min_theta cal(E)\(u_theta\). $ Similar to PINNs, energy-based methods still use a neural network to approximate the solution field itself. However, the training signal comes from minimizing a global energy functional $cal(E)$ rather than penalizing pointwise PDE residuals. In practice, the integral in $cal(E)$ can be estimated by Monte Carlo sampling over the physical domain, making the objective compatible with standard stochastic gradient optimization. Compared with residual-based PINNs, variational formulations often require lower-order derivatives of the network output and can more naturally preserve physical structures encoded by the energy, such as force balance, symmetry, or conservation-related constraints, without introducing separate penalty losses. Recent work has extended this idea to neural operators. For example, Variational PINO (VINO)~@eshaghi2025variational trains a neural operator by minimizing the PDE energy, achieving strong performance without labeled solution data. === Weak formulations Another route is to enforce the PDE in an #emph[integrated] or #emph[weak] sense rather than pointwise. Instead of requiring the strong-form residual to vanish at individual collocation points, weak-form methods require the residual to vanish when tested against a set of test functions. For a set of test functions ${ v_k }_(k = 1)^K$, this can be written as $ cal(R)_theta\(v_k\):= integral_(cal(D)) r_theta\(upright(bold(z))\)thin v_k\(upright(bold(z))\)thin d upright(bold(z)) approx 0\,#h(2em) k = 1\,dots.h\,K\, $ where $cal(R)_theta\(v_k\)$ denotes the weak residual associated with the test function $v_k$. In practice, weak formulations often integrate the PDE by parts, which transfers derivatives from the neural solution $u_theta$ to the test functions. This reduces the derivative order required from the neural network and can improve stability for irregular or non-smooth solutions. Variational Physics-Informed Neural Networks (VPINNs)~@kharazmi2019vpinn optimize a loss over such weak residuals: $ cal(L)_(upright(w e a k)) = 1 / K sum_(k = 1)^K lr(|cal(R)_theta \( v_k \)|)^2 . $ Relative to standard PINNs, the key difference is that the PDE is enforced in an averaged integral sense rather than pointwise. hp-VPINNs~@kharazmi2021hpvpinn retain the same weak-form principle, but apply it locally over a partition of the domain $cal(D) = union.big_(e = 1)^(N_(upright(s d))) cal(D)_e$. The corresponding local weak-form loss can be written as $ cal(L)_(upright(h p)) = frac(1, N_(upright(s d)) K) sum_(e = 1)^(N_(upright(s d))) sum_(k = 1)^K lr(|cal(R)_theta^(\(e\)) \( v_k^(\(e\)) \)|)^2\, $ where $ cal(R)_theta^(\(e\))\(v_k^(\(e\))\):= integral_(cal(D)_e) r_theta\(upright(bold(z))\)thin v_k^(\(e\))\(upright(bold(z))\)thin d upright(bold(z)) . $ Here, $cal(D)_e$ denotes the $e$-th subdomain, $N_(upright(s d))$ is the number of subdomains, and $v_k^(\(e\))$ is a local test function on $cal(D)_e$. This local formulation makes refinement more flexible: $h$-refinement subdivides the domain more finely, while $p$-refinement increases the polynomial order of the local test space. As a result, hp-VPINNs can better resolve multi-scale or spatially heterogeneous solutions. A related line of work studies the choice of test space and residual norm. For example, Robust VPINNs~@rojas2024robust address the sensitivity of classical VPINNs to the test basis by minimizing residuals in a dual norm, leading to improved stability. === Adversarial/Minimax formulations Weak formulations can also be cast as saddle-point problems. A representative example is the Weak Adversarial Network (WAN)~@zang2020weak. Instead of choosing a fixed set of test functions, WAN parameterizes both the solution and the test function with neural networks: $u_theta$ for the solution and $phi_eta$ for the test function. The method then solves a minimax problem of the form $ min_theta max_eta #h(0em) cal(J)\(theta\,eta\)\, $ where $cal(J)\(theta\,eta\)$ measures the weak residual induced by the test network $phi_eta$. Intuitively, the solution network $u_theta$ tries to minimize the residual, while the test network $phi_eta$ acts as an adversary that searches for regions or directions where the current solution still violates the PDE. Therefore, rather than enforcing the residual against a fixed test basis, WAN adaptively learns test functions that expose the remaining error. This adversarial weak-form perspective is especially useful when hand-designed test functions are insufficient or when the PDE is high-dimensional or non-smooth. === Summary Together, these developments broaden the "formulation" stage of PINN research. They demonstrate that one can teach a network to respect a PDE either by driving a pointwise residual to zero (classical PINN~@raissi2019pinn), by minimizing an energy integral (Deep Ritz~@yu2018deepritz, VINO~@eshaghi2025variational), by enforcing weighted integral constraints (VPINN~@kharazmi2019vpinn, hp-VPINN~@kharazmi2021hpvpinn, WF-PINN~@wang2025wf, etc.), or even by solving a minimax problem (WAN~@zang2020weak). Each alternative has its own advantages: variational forms lower the required smoothness, weak forms improve stability on irregular solutions, and projection or flux methods enforce conservation exactly. The literature continues to evolve these ideas, offering a rich toolkit for physics-informed learning beyond the original PINN objective. == Diagnosis: why naive composite losses fail in PINN Subsequent work showed that the challenge of data/loss-embedded physics lies not only in formulating the objective, but also in optimizing it reliably. For naive composite PINN losses, failures can arise from several intertwined sources: imbalanced gradients across loss terms, uneven convergence dynamics, ill-conditioned residual optimization, and representation bias in the neural network itself. === Loss imbalance and uneven convergence For the composite PINN loss in Eq.~@eq:pinn_loss_compact, Wang et al.~@wang2021gradientpathologies showed that the PDE, boundary, initial, and data terms can induce highly imbalanced gradients, so that some objectives dominate training while others make little progress. To diagnose this imbalance, they compared the gradient magnitudes contributed by different loss terms and proposed adaptively balancing each non-PDE term $cal(L)_i in { cal(L)_(upright(B C))\,cal(L)_(upright(I C))\,cal(L)_(upright(d a t a)) }$ against the PDE term: $ hat(lambda)_i = frac(max #scale(x: 120%, y: 120%)[\|] nabla_theta cal(L)_(upright(P D E)) #scale(x: 120%, y: 120%)[\|], "mean" #scale(x: 120%, y: 120%)[\|] nabla_theta cal(L)_i #scale(x: 120%, y: 120%)[\|])\,#h(2em) lambda_i arrow.l\(1 - alpha\)lambda_i + alpha hat(lambda)_i\, $ where $alpha in\(0\,1\)$ is a smoothing factor. This observation reveals that PINN training can fail even when each individual loss term is well defined, because the composite objective may provide poorly balanced optimization signals. A complementary perspective comes from the neural tangent kernel (NTK) analysis. Wang et al.~@wang2022and showed that different components of the PINN objective can converge at substantially different rates during training. This suggests that the imbalance is not only a matter of manually chosen scalar weights or instantaneous gradient magnitudes, but is also tied to the spectrum of the training dynamics induced by the PDE operator and the neural parameterization. In other words, gradient imbalance is a local symptom of a broader convergence-rate mismatch among the physics and data constraints. === Ill-conditioned residual optimization Krishnapriyan et al.~@krishnapriyan2021failuremodes further showed that failures on harder PDEs often arise not from limited expressivity, but from optimization difficulty and the brittleness of strong-form residual minimization. Their analysis can be viewed through objectives of the form $ min_theta #h(0em) cal(L)_u + lambda_r cal(L)_(upright(P D E))\, $ where $cal(L)_u$ is shorthand for the supervision terms on the solution, including boundary, initial, and observed data terms. Their key observation is that simply increasing the PDE weight $lambda_r$ does not necessarily improve training: while a larger $lambda_r$ enforces physics more strongly, it can also make the optimization problem more ill-conditioned. Related loss-landscape analyses similarly show that differential operators in the residual term can produce poorly conditioned objectives, making PINN training sensitive to optimizer choice and hyperparameter settings~@rathore2024challenges. === Representation and frequency bias Another diagnosis concerns the representation bias of the neural network itself. Standard fully connected networks tend to learn smooth, low-frequency components more easily than high-frequency or multi-scale structures. Wang et al. ~@wang2021eigenvector connected this behavior to the eigenspectrum of the limiting NTK and showed that conventional PINNs can struggle when the target solution contains sharp spatial or temporal variations. Thus, even when the PDE residual is correctly specified, the neural parameterization and its optimization dynamics may bias training away from the physically relevant solution. === Takeaway Together, these diagnoses show that naive composite PINN losses can fail for several intertwined reasons: different loss terms may generate imbalanced or conflicting gradients, the residual objective may be ill-conditioned, and the neural parameterization may favor smooth low-frequency solutions over the multi-scale structures required by the PDE. These observations motivate the remedy strategies discussed next, which aim to rebalance, resample, schedule, or better optimize the physics-informed objective. == Diagnosis: why naive composite losses fail Subsequent work showed that the challenge of data/loss-embedded physics lies not only in formulating the objective, but also in optimizing it reliably. For the composite PINN loss in Eq.~@eq:pinn_loss_compact, Wang et al.~@wang2021gradientpathologies showed that the PDE, boundary, initial, and data terms can induce highly imbalanced gradients, so that some objectives dominate training while others make little progress. To diagnose and mitigate this issue, they proposed adaptively balancing each non-PDE term $cal(L)_i in { cal(L)_(upright(B C))\,cal(L)_(upright(I C))\,cal(L)_(upright(d a t a)) }$ against the PDE term: $ hat(lambda)_i = frac(max #scale(x: 120%, y: 120%)[\|] nabla_theta cal(L)_(upright(P D E)) #scale(x: 120%, y: 120%)[\|], #scale(x: 120%, y: 120%)[\|] nabla_theta cal(L)_i #scale(x: 120%, y: 120%)[\|])\,#h(2em) lambda_i arrow.l\(1 - alpha\)lambda_i + alpha hat(lambda)_i\, $ where $alpha in\(0\,1\)$ is a smoothing factor. Krishnapriyan et al.~@krishnapriyan2021failuremodes further showed that failures on harder PDEs often arise not from limited expressivity, but from optimization difficulty and the brittleness of strong-form residual minimization itself. Their analysis can be viewed through objectives of the form $ min_theta #h(0em) cal(L)_u + lambda_r cal(L)_(upright(P D E))\, $ where $cal(L)_u$ is shorthand for the supervision terms on the solution, including boundary, initial, and observed data terms. Their key observation is that simply increasing the PDE weight $lambda_r$ does not necessarily improve training: while a larger $lambda_r$ enforces physics more strongly, it can also make the optimization landscape more ill-conditioned. To make this more manageable, they explored curriculum regularization, which schematically replaces the target PDE loss by a sequence of progressively harder PDE losses, $ min_theta #h(0em) cal(L)_u + lambda_r cal(L)_(upright(P D E))^(\(s\))\,#h(2em) s = 1\,dots.h\,S\, $ where $s$ indexes the curriculum stage and $S$ is the total number of stages. Intuitively, the curriculum does not change the overall formulation, but makes the PDE part of the objective easier to optimize in early stages. == Remedies The above failure modes have motivated a broad family of remedies, summarized in Table~@tab:pinn_remedies. To connect these methods with the failure modes discussed above, we organize them according to which part of the PINN pipeline they modify: loss balancing and optimization, residual sampling and curriculum design, neural representation and architecture, constraint enforcement, and domain decomposition. This taxonomy also highlights a useful distinction: some methods directly reshape the composite optimization objective, while others improve the sampling strategy, the neural trial space, the enforcement of physics constraints, or the scalability of the solver. === Loss balancing and optimization #strong[Loss balancing.] A first class of remedies addresses the composite PINN objective itself. Because the PDE residual, boundary conditions, initial conditions, and data terms can have very different magnitudes and gradient scales, fixed loss weights may cause some objectives to dominate training while others make little progress. Gradient-flow analyses therefore proposed adaptive weighting rules based on the gradient statistics of different loss terms @wang2021gradientpathologies. Related NTK-based analyses further showed that different components of the PINN loss can converge at different rates, motivating dynamic weights that balance the training dynamics of multiple physics constraints @wang2022and. More recent loss-balancing methods such as ReLoBRaLo formulate this issue as a multi-objective balancing problem and adjust weights according to relative training progress @bischof2025multi. Self-Adaptive PINNs~@mcclenny2023self address the same general issue from a point-wise residual-weighting perspective. Instead of assigning a fixed penalty to each collocation point, they introduce trainable adaptive weights: $ cal(L)_(upright(P D E)) = sum_j w_j thin r_theta\(upright(bold(x))_j\,t_j\)^2\, $ where $w_j$ is the adaptive importance weight for the residual at collocation point $\(upright(bold(x))_j\,t_j\)$. The network parameters are optimized to minimize the loss, while the weights are encouraged to increase on hard points with large residuals. As a result, the method automatically allocates more optimization effort to regions where the PDE is most strongly violated. #strong[Optimization.] Beyond weighting, optimizer design is also central to PINN training. Recent loss-landscape studies show that PINN objectives can be highly ill-conditioned, partly because differential operators amplify certain directions in parameter space~@rathore2024challenges. This explains why second-order or quasi-second-order optimizers such as L-BFGS~@liu1989limited, NysNewton CG~@rathore2024challenges, and SOAP-style preconditioning @wanggradient@vyas2025soap can substantially improve training stability. Schematically, such methods precondition the gradient update as $ theta_(t + 1) approx theta_t - eta H^(- 1) g_t\, $ where $theta_t$ denotes the model parameters, $g_t$ is the total gradient, $eta$ is the learning rate, and $H$ denotes a curvature matrix or its approximation. Intuitively, curvature-aware preconditioning rescales poorly conditioned directions and can implicitly reduce conflicts among the gradients induced by different loss terms. These methods correspond to the first block of Table~@tab:pinn_remedies, which focuses on improving how the composite PINN objective is weighted and optimized. === Residual sampling and causal curricula #strong[Residual sampling.] A second class of remedies changes the distribution and order of physics supervision. In standard PINNs, collocation points are often sampled uniformly from the spatio-temporal domain. However, uniform sampling can waste many residual points in regions that are already well learned, while undersampling difficult regions with large PDE violations. Residual-based adaptive refinement methods, including RAR, RAD, and RAR-D, therefore update the sampling distribution according to the current residual @wu2023comprehensive: $ p\(upright(bold(x))\,t\)prop phi.alt #h(-1em) (lr(|r_theta \( upright(bold(x)) \, t \)|))\, $ where $p\(upright(bold(x))\,t\)$ denotes the sampling density and $phi.alt\(dot.op\)$ is a monotone function of the residual magnitude. This shifts collocation points toward regions where the current PINN violates the governing equation most strongly. Region-optimized PINNs further refine this idea by optimizing the spatial allocation of residual points more explicitly @wu2024ropinn. In this sense, adaptive sampling improves where physics is enforced, rather than changing the PDE loss itself. #strong[Causality-aware sampling.] For time-dependent problems, another important issue is temporal ordering. If residuals from all time steps are optimized simultaneously, errors from early times can propagate forward and make long-time prediction difficult. Causality-aware training addresses this problem by decomposing the temporal domain into chunks and weighting later chunks according to the accuracy of earlier ones @wang2024respecting: $ cal(L)_(upright(P D E))\(theta\)= 1 / N_t sum_(i = 1)^(N_t) omega_i thin cal(L)_(upright(P D E))^(\(i\))\(theta\)\, $ where $N_t$ is the number of temporal chunks, $cal(L)_(upright(P D E))^(\(i\))$ is the residual loss on the $i$-th time slab, and $omega_i$ is a causal weight. The weights are designed so that later times receive significant penalty only after earlier-time residuals have been sufficiently reduced. Curriculum-based methods such as CoPINN extend this idea by explicitly organizing training from easier to harder residual constraints @duan2025copinn. Together, the second block of Table~@tab:pinn_remedies summarizes methods that improve where, when, and in what order residual supervision is imposed. === Representation and architecture A third class of remedies addresses PINN failures from the perspective of neural representation. Standard coordinate-based MLPs often suffer from spectral bias, which makes them learn low-frequency components more easily than high-frequency or multi-scale structures. This is problematic for PDEs with sharp gradients, oscillatory solutions, boundary layers, or multi-scale dynamics. Fourier feature embeddings directly target this limitation by reshaping the coordinate representation and mitigating eigenvector bias in multi-scale PDEs @wang2021eigenvector. Similarly, sinusoidal activations provide a neural representation better suited for high-frequency implicit functions @sitzmann2020implicit, while locally adaptive activation functions introduce learnable activation slopes to accelerate convergence @jagtap2020locally. These methods do not directly modify the physics loss, but they make the neural trial space better matched to the target solution. More recent architectural remedies redesign the PINN backbone itself. SPINN uses separable network structures to improve efficiency, particularly through more efficient forward-mode automatic differentiation @cho2023separable. PINNsformer instead introduces a Transformer-based architecture to model sequential dependencies in physics-informed learning @zhao2024pinnsformer. These methods correspond to the representation and architecture block of Table~@tab:pinn_remedies: they are most useful when the difficulty comes not only from loss imbalance or sampling, but also from a mismatch between a simple MLP and the structure of the PDE solution. === Constraint enforcement A fourth class of remedies modifies how boundary, initial, and physical constraints are imposed. In standard PINNs, boundary and initial conditions are usually enforced as soft penalty terms in the loss. This introduces additional loss-balancing difficulty: if the penalty is too small, the constraints may be violated; if it is too large, the PDE residual may be under-optimized. Classical hard-constrained neural trial functions address this issue by constructing solutions that satisfy prescribed constraints by design @lagaris1998artificial. A typical form is $ u_theta\(upright(bold(x))\,t\)= g\(upright(bold(x))\,t\)+ d\(upright(bold(x))\,t\)N_theta\(upright(bold(x))\,t\)\, $ where $g\(upright(bold(x))\,t\)$ satisfies the prescribed constraint, $d\(upright(bold(x))\,t\)$ vanishes on the constrained boundary, and $N_theta$ is the trainable neural network. Since the constraint is built into the solution form, the optimizer no longer needs to enforce it only through a soft penalty weight. Modern PINN libraries and formulations further implement such hard constraints using approximate distance functions and geometry-aware output transformations @lu2021deepxde. Recent work also studies soft and hard boundary constraints for specific PDE families such as advection--diffusion equations @li2024physical. Variational and weak-form PINNs provide another way to improve constraint and residual enforcement. Instead of directly minimizing the point-wise strong-form PDE residual, these methods enforce the governing equation against test functions in an integral form. hp-VPINNs combine this variational formulation with hp-refinement and domain decomposition, improving the connection between PINNs and classical finite-element or Galerkin methods @kharazmi2021hpvpinn. Thus, the constraint-enforcement block of Table~@tab:pinn_remedies captures two related strategies: satisfying constraints by construction and replacing strong-form residuals with weak-form or variational objectives. === Domain decomposition and scalability Finally, a fifth class of remedies improves PINNs by localizing the learning problem. Instead of fitting a single global network over the entire spatio-temporal domain, domain-decomposition methods divide the domain into subregions and train local networks coupled through interface, conservation, or partition-of-unity constraints. Conservative PINNs impose interface flux continuity for conservation laws @jagtap2020conservative, while XPINNs generalize this idea to flexible space-time domain decomposition for nonlinear PDEs @jagtap2020extended. FBPINNs further introduce overlapping subdomains and partition-of-unity weighting to make the decomposition more scalable and localized @moseley2023finite. Recent extensions improve the scalability and adaptivity of this decomposition view. Multilevel FBPINNs introduce hierarchical decompositions to improve global communication across subdomains @dolean2024multilevel, while AB-PINNs use residual-driven adaptive bases to dynamically allocate decomposition capacity @botvinick2025ab. These methods correspond to the final block of Table~@tab:pinn_remedies. They are especially useful for heterogeneous, multi-scale, or long-time problems where a single global PINN is difficult to optimize. === Summary Overall, the remedies in Table~@tab:pinn_remedies show that PINN performance is determined not only by whether the correct physical equations are included in the objective, but also by whether the resulting learning problem is numerically trainable. Loss balancing and second-order optimization improve how competing objectives are minimized; adaptive sampling and causal curricula improve where and when residuals are enforced; representation and architectural methods improve what functions the network can express; hard constraints and weak forms improve how physics is encoded; and domain decomposition improves scalability to complex physical systems. #figure( [ #show table.cell: set text(size: 6pt) #set table.hline(stroke: (dash: "solid", thickness: 0.5pt)) #table( columns: (1fr, auto, auto, auto), stroke: none, align: left + horizon, inset: 2pt, table.header[*Type*][*Method*][*Venue / Year*][*Keyword-style Contribution*], table.hline(), table.cell(rowspan: 6)[*Loss balancing and optimization*], [Gradient-flow weighting~], [SISC 2021], [Adaptive loss weighting based on gradient statistics across PDE, boundary, and data terms.], [NTK-based weighting~], [JCP 2022], [Balances different physics constraints through neural tangent kernel training dynamics.], [SA-PINNs~], [JCP 2023], [Learns adaptive residual weights to emphasize difficult collocation points.], [Loss-landscape / NysNewton-CG~], [ICML 2024], [Studies PINN ill-conditioning and improves training with second-order optimization.], [ReLoBRaLo~], [CMAME 2025], [Relative loss balancing with random lookback for multi-objective PINN training.], [SOAP / gradient alignment~], [NeurIPS 2025], [Uses quasi-second-order preconditioning to improve gradient alignment in composite PINN objectives.], table.hline(), table.cell(rowspan: 4)[*Residual sampling and curriculum*], [RAR / RAD / RAR-D~], [CMAME 2023], [Residual-based adaptive refinement and distribution-based collocation sampling.], [RoPINN~], [NeurIPS 2024], [Region-optimized residual sampling for more efficient collocation point selection.], [Causal PINN training~], [CMAME 2024], [Causality-aware temporal segmentation and residual reweighting for time-dependent PDEs.], [CoPINN~], [ICML 2025], [Cognitive easy-to-hard curriculum training for progressively enforcing difficult residuals.], table.hline(), table.cell(rowspan: 5)[*Representation and architecture*], [Fourier features / eigenvector bias~], [CMAME 2021], [Multi-scale coordinate embeddings to mitigate spectral and eigenvector bias.], [Adaptive activation functions~], [Proc. R. Soc. A 2020], [Learnable activation slopes and slope-recovery terms for faster convergence.], [SIREN~], [NeurIPS 2020], [Sinusoidal activations for representing high-frequency implicit functions.], [SPINN~], [NeurIPS 2023], [Separable network structure for efficient forward-mode automatic differentiation.], [PINNsformer~], [ICLR 2024], [Transformer-based architecture for modeling sequential dependencies in PINNs.], table.hline(), table.cell( rowspan: 4)[*Constraint enforcement*], [Approximate distance functions~], [SIAM Review 2021], [Implements hard constraints using distance functions and geometry-aware output transformations.], [hp-VPINN~], [CMAME 2021], [Variational weak-form PINNs with hp-refinement and domain decomposition.], [Hard initial/boundary constraints~], [CMA 2024], [Enforces prescribed initial and boundary conditions through constrained solution forms.], [cPINN~], [CMAME 2020], [Conservative domain decomposition with interface flux continuity for conservation laws.], table.hline(), table.cell( rowspan: 4)[*Domain decomposition and scalability*], [XPINN~], [CCP 2020], [General space-time domain decomposition for heterogeneous PDE problems.], [FBPINN~], [ACOM 2023], [Overlapping subdomains with partition-of-unity weighting for localized training.], [Multilevel FBPINN~], [CMAME 2024], [Hierarchical domain decomposition for improved global communication and scalability.], [AB-PINN~], [arXiv 2025], [Adaptive residual-driven decomposition for dynamically allocating subdomains.], ) ] ) #bibliography("part-1.bib")