Deriving Backpropagation from Wave Equations

Neural Networks as Wave Propagation Systems
Macheng Shen & OpenClaw | March 9, 2026

Abstract: We re-examine neural networks from physical first principles, viewing them as discrete layered waveguide media. In this framework, the forward pass corresponds to wave propagation, and backpropagation corresponds to error waves reflected from boundary mismatches. This perspective not only reinterprets the classical gradient descent algorithm but also reveals the physical constraints on network architecture design and provides a unified wave-dynamics framework for understanding phenomena like vanishing gradients and exploding gradients.

1. Setup: Neural Networks as Layered Media

1.1 Physical Model

Imagine a neural network as \(N\) layers of waveguide media:

1.2 Activation Dynamics

Let \(\mathbf{x}_l(t) \in \mathbb{R}^{n_l}\) be the wave amplitude (activation value) at layer \(l\) at time \(t\). Inter-layer propagation follows the dynamical equation:

\[ \frac{\partial \mathbf{x}_l}{\partial t} = \sigma(\mathbf{W}_l \mathbf{x}_{l-1}) - \gamma_l \mathbf{x}_l \tag{1} \]

where:

1.3 Steady-State Solution and Standard Forward Pass

At steady state (\(\partial \mathbf{x}_l / \partial t = 0\)), equation (1) reduces to:

\[ \mathbf{x}_l = \frac{1}{\gamma_l} \sigma(\mathbf{W}_l \mathbf{x}_{l-1}) \]

Without loss of generality, set \(\gamma_l = 1\) (or absorb it into the weights), yielding the standard forward pass:

Forward Pass (Steady-State Wave Propagation)
\[ \mathbf{x}_l = \sigma(\mathbf{W}_l \mathbf{x}_{l-1}), \quad l = 1, 2, \ldots, L \tag{2} \]

Physical interpretation: The forward pass is the steady-state distribution of waves propagating from input layer \(\mathbf{x}_0\) layer-by-layer to output layer \(\mathbf{x}_L\).

2. Loss Function: Impedance Mismatch at the Boundary

2.1 Boundary Condition

At output layer \(L\), we have:

2.2 Loss as Boundary Mismatch

The loss function measures the degree to which the boundary condition is satisfied:

Loss Function (Boundary Mismatch)
\[ \mathcal{L} = \frac{1}{2} \|\mathbf{x}_L - \mathbf{y}^*\|^2 \tag{3} \]

Physical meaning: In wave theory, unsatisfied boundary conditions lead to the generation of reflected waves. The loss function measures the intensity of this "reflection."

3. Backpropagation: Backward-Propagating Error Waves

3.1 Time-Reversal Symmetry

Many wave equations satisfy time-reversal symmetry: if \(\mathbf{x}(t)\) is a solution, then \(\mathbf{x}(-t)\) is also a solution. This inspires us:

Core Idea
Backpropagation is time-reversed forward pass, carrying "error/reflected waves."

3.2 Definition of Error Wave

Define the error wave (or adjoint wave) at layer \(l\):

\[ \boldsymbol{\delta}_l \equiv \frac{\partial \mathcal{L}}{\partial \mathbf{x}_l} \tag{4} \]

This is the gradient of the loss function with respect to the activation at layer \(l\), corresponding to the amplitude of the reflected wave in wave dynamics.

3.3 Boundary Condition (Output Layer)

From equations (3) and (4), the initial condition for the error wave at the output layer is:

\[ \boldsymbol{\delta}_L = \frac{\partial \mathcal{L}}{\partial \mathbf{x}_L} = \mathbf{x}_L - \mathbf{y}^* \tag{5} \]

Physical interpretation: This is the initial amplitude of the "reflected wave" generated by boundary mismatch.

3.4 Backpropagation Equation

Using the chain rule:

\[ \boldsymbol{\delta}_{l-1} = \frac{\partial \mathcal{L}}{\partial \mathbf{x}_{l-1}} = \left(\frac{\partial \mathbf{x}_l}{\partial \mathbf{x}_{l-1}}\right)^T \boldsymbol{\delta}_l \]

From the forward pass equation (2), we have:

\[ \frac{\partial \mathbf{x}_l}{\partial \mathbf{x}_{l-1}} = \text{diag}(\sigma'(\mathbf{z}_l)) \cdot \mathbf{W}_l \]

where \(\mathbf{z}_l = \mathbf{W}_l \mathbf{x}_{l-1}\) is the linear combination before activation. Therefore:

Backpropagation (Reflected Wave Propagation)
\[ \boldsymbol{\delta}_{l-1} = \mathbf{W}_l^T \cdot \text{diag}(\sigma'(\mathbf{z}_l)) \cdot \boldsymbol{\delta}_l \tag{6} \]

4. Physical Interpretation

Equation (6) contains profound physical meaning:

4.1 Backward Propagation Matrix \(\mathbf{W}_l^T\)

This corresponds to the reciprocity theorem in waveguide theory: the propagation matrix for the reflected wave is the transpose (or conjugate transpose) of the forward matrix.

4.2 Nonlinear Modulation \(\sigma'(\mathbf{z}_l)\)

4.3 Error Wave Energy Decay

If \(|\sigma'(\mathbf{z}_l)| < 1\) (e.g., sigmoid in saturation region), then:

\[ \|\boldsymbol{\delta}_{l-1}\| \leq \|\mathbf{W}_l^T\| \cdot \|\sigma'(\mathbf{z}_l)\|_{\infty} \cdot \|\boldsymbol{\delta}_l\| \]
Physical Origin of Vanishing Gradient
The reflected wave decays during propagation. If the medium is highly absorptive (\(\sigma'\) small), wave energy is rapidly lost, leading to vanishing gradient.

Conversely, if \(|\sigma'(\mathbf{z}_l)| > 1\) or weight norms are too large, the reflected wave amplifies, leading to exploding gradient.

5. Weight Updates: Adjusting Medium Impedance

5.1 Weight Gradient

The gradient with respect to weights is:

\[ \frac{\partial \mathcal{L}}{\partial \mathbf{W}_l} = \boldsymbol{\delta}_l \mathbf{x}_{l-1}^T \tag{7} \]

Physical meaning: This is the outer product (correlation) of the incident wave \(\mathbf{x}_{l-1}\) and the reflected wave \(\boldsymbol{\delta}_l\).

In optics, this is analogous to holographic interference fringes: the coherent superposition of two waves records phase and amplitude information.

5.2 Gradient Descent

Weight Update (Medium Adjustment)
\[ \mathbf{W}_l \leftarrow \mathbf{W}_l - \eta \frac{\partial \mathcal{L}}{\partial \mathbf{W}_l} \tag{8} \]

Physical interpretation: Gradually adjust the propagation characteristics (impedance/refractive index) of each layer's medium so that wave transmission from input to output becomes more "smooth," minimizing boundary reflection.

This is similar to adaptive optics: measuring reflection/distortion and adjusting mirror shape in real-time to optimize the optical path.

6. Wave-Dynamics Reinterpretation of Classical Phenomena

Phenomenon Traditional Explanation Wave Dynamics Explanation
Vanishing gradient Product of derivatives < 1 Reflected wave decays in absorptive medium
Exploding gradient Product of derivatives > 1 Reflected wave amplifies/resonates in gain medium
ResNet skip connections Alleviates vanishing gradient Provides low-loss "bypass waveguide"
Batch Normalization Stabilizes training Impedance matching, reduces inter-layer reflection
ReLU vs Sigmoid No gradient saturation ReLU absorbs reflected wave less
Spectral Bias —— Impedance too high at certain frequencies

7. New Testable Predictions

Based on the wave propagation framework, we can make the following testable predictions:

Prediction 1: Eigenmodes and Resonance

Proposition: For a trained network, there exists a set of eigenmodes that can propagate with minimal loss. The essence of training is to make target input-output patterns become eigenmodes of the network.

Experimental verification: Perform modal analysis (eigendecomposition of Jacobian) on trained networks and observe whether modes corresponding to dominant eigenvalues align with principal components of training data.

Prediction 2: Frequency Response Analysis

Proposition: Well-performing networks should have flat frequency response (low impedance) at task-relevant frequencies, while failing networks exhibit high-impedance peaks at critical frequencies.

Experimental verification: Measure the network's transfer function for input signals at different frequencies and plot frequency response curves.

Prediction 3: Nonlinearity Strength and Expressivity

Proposition: Stronger nonlinearity corresponds to richer wave frequency mixing (e.g., second harmonic generation), thereby enhancing network expressivity. Linear networks cannot generate new frequencies.

8. Conclusion and Outlook

We have re-derived neural network forward and backward propagation from a wave dynamics perspective, revealing the following core insights:

  1. Forward pass = wave propagation: Information propagates as waves through layered media to the output boundary
  2. Loss = boundary mismatch: Loss function measures the degree of unsatisfied output boundary condition
  3. Backpropagation = reflected wave: Error waves generated by boundary mismatch propagate backward toward the input
  4. Weight update = impedance adjustment: Adjust medium propagation characteristics based on correlation between incident and reflected waves

This framework not only provides a unified explanation for classical phenomena like vanishing/exploding gradients, ResNet, and BatchNorm, but also lays theoretical foundations for physically implemented neural networks (optical, acoustic, spin-wave).

Next Steps:


Generated by OpenClaw | March 9, 2026