Deriving Backpropagation from Wave Equations
Neural Networks as Wave Propagation Systems
Macheng Shen & OpenClaw | March 9, 2026
Abstract:
We re-examine neural networks from physical first principles, viewing them as discrete layered waveguide media. In this framework, the forward pass corresponds to wave propagation, and backpropagation corresponds to error waves reflected from boundary mismatches. This perspective not only reinterprets the classical gradient descent algorithm but also reveals the physical constraints on network architecture design and provides a unified wave-dynamics framework for understanding phenomena like vanishing gradients and exploding gradients.
1. Setup: Neural Networks as Layered Media
1.1 Physical Model
Imagine a neural network as \(N\) layers of waveguide media:
- Each layer \(l \in \{1, 2, \ldots, L\}\) is a "medium slab"
- Waves propagate, refract, and undergo nonlinear interactions between layers
- Weight matrix \(\mathbf{W}_l\) describes the propagation characteristics of the medium (impedance/coupling strength)
1.2 Activation Dynamics
Let \(\mathbf{x}_l(t) \in \mathbb{R}^{n_l}\) be the wave amplitude (activation value) at layer \(l\) at time \(t\). Inter-layer propagation follows the dynamical equation:
\[
\frac{\partial \mathbf{x}_l}{\partial t} = \sigma(\mathbf{W}_l \mathbf{x}_{l-1}) - \gamma_l \mathbf{x}_l \tag{1}
\]
where:
- \(\mathbf{W}_l \in \mathbb{R}^{n_l \times n_{l-1}}\) is the propagation matrix for layer \(l\)
- \(\gamma_l > 0\) is the damping coefficient (energy loss rate)
- \(\sigma: \mathbb{R} \to \mathbb{R}\) is the nonlinear activation function (corresponding to nonlinear medium response, such as optical Kerr effect)
1.3 Steady-State Solution and Standard Forward Pass
At steady state (\(\partial \mathbf{x}_l / \partial t = 0\)), equation (1) reduces to:
\[
\mathbf{x}_l = \frac{1}{\gamma_l} \sigma(\mathbf{W}_l \mathbf{x}_{l-1})
\]
Without loss of generality, set \(\gamma_l = 1\) (or absorb it into the weights), yielding the standard forward pass:
Forward Pass (Steady-State Wave Propagation)
\[
\mathbf{x}_l = \sigma(\mathbf{W}_l \mathbf{x}_{l-1}), \quad l = 1, 2, \ldots, L \tag{2}
\]
Physical interpretation: The forward pass is the steady-state distribution of waves propagating from input layer \(\mathbf{x}_0\) layer-by-layer to output layer \(\mathbf{x}_L\).
2. Loss Function: Impedance Mismatch at the Boundary
2.1 Boundary Condition
At output layer \(L\), we have:
- Desired output (target boundary condition): \(\mathbf{y}^* \in \mathbb{R}^{n_L}\)
- Actual output (wave state at boundary): \(\mathbf{x}_L\)
2.2 Loss as Boundary Mismatch
The loss function measures the degree to which the boundary condition is satisfied:
Loss Function (Boundary Mismatch)
\[
\mathcal{L} = \frac{1}{2} \|\mathbf{x}_L - \mathbf{y}^*\|^2 \tag{3}
\]
Physical meaning: In wave theory, unsatisfied boundary conditions lead to the generation of reflected waves. The loss function measures the intensity of this "reflection."
3. Backpropagation: Backward-Propagating Error Waves
3.1 Time-Reversal Symmetry
Many wave equations satisfy time-reversal symmetry: if \(\mathbf{x}(t)\) is a solution, then \(\mathbf{x}(-t)\) is also a solution. This inspires us:
Core Idea
Backpropagation is time-reversed forward pass, carrying "error/reflected waves."
3.2 Definition of Error Wave
Define the error wave (or adjoint wave) at layer \(l\):
\[
\boldsymbol{\delta}_l \equiv \frac{\partial \mathcal{L}}{\partial \mathbf{x}_l} \tag{4}
\]
This is the gradient of the loss function with respect to the activation at layer \(l\), corresponding to the amplitude of the reflected wave in wave dynamics.
3.3 Boundary Condition (Output Layer)
From equations (3) and (4), the initial condition for the error wave at the output layer is:
\[
\boldsymbol{\delta}_L = \frac{\partial \mathcal{L}}{\partial \mathbf{x}_L} = \mathbf{x}_L - \mathbf{y}^* \tag{5}
\]
Physical interpretation: This is the initial amplitude of the "reflected wave" generated by boundary mismatch.
3.4 Backpropagation Equation
Using the chain rule:
\[
\boldsymbol{\delta}_{l-1} = \frac{\partial \mathcal{L}}{\partial \mathbf{x}_{l-1}} = \left(\frac{\partial \mathbf{x}_l}{\partial \mathbf{x}_{l-1}}\right)^T \boldsymbol{\delta}_l
\]
From the forward pass equation (2), we have:
\[
\frac{\partial \mathbf{x}_l}{\partial \mathbf{x}_{l-1}} = \text{diag}(\sigma'(\mathbf{z}_l)) \cdot \mathbf{W}_l
\]
where \(\mathbf{z}_l = \mathbf{W}_l \mathbf{x}_{l-1}\) is the linear combination before activation. Therefore:
Backpropagation (Reflected Wave Propagation)
\[
\boldsymbol{\delta}_{l-1} = \mathbf{W}_l^T \cdot \text{diag}(\sigma'(\mathbf{z}_l)) \cdot \boldsymbol{\delta}_l \tag{6}
\]
4. Physical Interpretation
Equation (6) contains profound physical meaning:
4.1 Backward Propagation Matrix \(\mathbf{W}_l^T\)
- Forward: Wave propagates from layer \(l-1\) to layer \(l\) via \(\mathbf{W}_l\)
- Backward: Reflected wave propagates from layer \(l\) back to layer \(l-1\) via \(\mathbf{W}_l^T\)
This corresponds to the reciprocity theorem in waveguide theory: the propagation matrix for the reflected wave is the transpose (or conjugate transpose) of the forward matrix.
4.2 Nonlinear Modulation \(\sigma'(\mathbf{z}_l)\)
- \(\sigma'\) describes the "differential response" of the medium
- The reflected wave passes through the same nonlinear medium but only "experiences" its linearized version
- This is analogous to small-signal approximation: when reflected wave amplitude is small, nonlinear terms can be linearized
4.3 Error Wave Energy Decay
If \(|\sigma'(\mathbf{z}_l)| < 1\) (e.g., sigmoid in saturation region), then:
\[
\|\boldsymbol{\delta}_{l-1}\| \leq \|\mathbf{W}_l^T\| \cdot \|\sigma'(\mathbf{z}_l)\|_{\infty} \cdot \|\boldsymbol{\delta}_l\|
\]
Physical Origin of Vanishing Gradient
The reflected wave decays during propagation. If the medium is highly absorptive (\(\sigma'\) small), wave energy is rapidly lost, leading to
vanishing gradient.
Conversely, if \(|\sigma'(\mathbf{z}_l)| > 1\) or weight norms are too large, the reflected wave amplifies, leading to exploding gradient.
5. Weight Updates: Adjusting Medium Impedance
5.1 Weight Gradient
The gradient with respect to weights is:
\[
\frac{\partial \mathcal{L}}{\partial \mathbf{W}_l} = \boldsymbol{\delta}_l \mathbf{x}_{l-1}^T \tag{7}
\]
Physical meaning: This is the outer product (correlation) of the incident wave \(\mathbf{x}_{l-1}\) and the reflected wave \(\boldsymbol{\delta}_l\).
In optics, this is analogous to holographic interference fringes: the coherent superposition of two waves records phase and amplitude information.
5.2 Gradient Descent
Weight Update (Medium Adjustment)
\[
\mathbf{W}_l \leftarrow \mathbf{W}_l - \eta \frac{\partial \mathcal{L}}{\partial \mathbf{W}_l} \tag{8}
\]
Physical interpretation: Gradually adjust the propagation characteristics (impedance/refractive index) of each layer's medium so that wave transmission from input to output becomes more "smooth," minimizing boundary reflection.
This is similar to adaptive optics: measuring reflection/distortion and adjusting mirror shape in real-time to optimize the optical path.
6. Wave-Dynamics Reinterpretation of Classical Phenomena
| Phenomenon |
Traditional Explanation |
Wave Dynamics Explanation |
| Vanishing gradient |
Product of derivatives < 1 |
Reflected wave decays in absorptive medium |
| Exploding gradient |
Product of derivatives > 1 |
Reflected wave amplifies/resonates in gain medium |
| ResNet skip connections |
Alleviates vanishing gradient |
Provides low-loss "bypass waveguide" |
| Batch Normalization |
Stabilizes training |
Impedance matching, reduces inter-layer reflection |
| ReLU vs Sigmoid |
No gradient saturation |
ReLU absorbs reflected wave less |
| Spectral Bias |
—— |
Impedance too high at certain frequencies |
7. New Testable Predictions
Based on the wave propagation framework, we can make the following testable predictions:
Prediction 1: Eigenmodes and Resonance
Proposition: For a trained network, there exists a set of eigenmodes that can propagate with minimal loss. The essence of training is to make target input-output patterns become eigenmodes of the network.
Experimental verification: Perform modal analysis (eigendecomposition of Jacobian) on trained networks and observe whether modes corresponding to dominant eigenvalues align with principal components of training data.
Prediction 2: Frequency Response Analysis
Proposition: Well-performing networks should have flat frequency response (low impedance) at task-relevant frequencies, while failing networks exhibit high-impedance peaks at critical frequencies.
Experimental verification: Measure the network's transfer function for input signals at different frequencies and plot frequency response curves.
Prediction 3: Nonlinearity Strength and Expressivity
Proposition: Stronger nonlinearity corresponds to richer wave frequency mixing (e.g., second harmonic generation), thereby enhancing network expressivity. Linear networks cannot generate new frequencies.
8. Conclusion and Outlook
We have re-derived neural network forward and backward propagation from a wave dynamics perspective, revealing the following core insights:
- Forward pass = wave propagation: Information propagates as waves through layered media to the output boundary
- Loss = boundary mismatch: Loss function measures the degree of unsatisfied output boundary condition
- Backpropagation = reflected wave: Error waves generated by boundary mismatch propagate backward toward the input
- Weight update = impedance adjustment: Adjust medium propagation characteristics based on correlation between incident and reflected waves
This framework not only provides a unified explanation for classical phenomena like vanishing/exploding gradients, ResNet, and BatchNorm, but also lays theoretical foundations for physically implemented neural networks (optical, acoustic, spin-wave).
Next Steps:
- Guide neural network architecture design using waveguide design principles
- Experimentally measure frequency response and eigenmodes of real networks
- Explore energy efficiency advantages of physical wave computing
Generated by OpenClaw | March 9, 2026