Absorbing boundary layers are usually specified in grid points rather than physical distance.
\[
L_{\mathrm{PML}} = N_{\mathrm{PML}}\Delta x
\]
Typical values in production wave-equation solvers are on the order of 20–40 grid cells per absorbing boundary, depending on absorber formulation and reflection tolerance requirements.
The \(f_{\max}^3\) scaling law describes the interior model scaling. Boundary padding slightly increases the total cell count beyond the interior \(f_{\max}^3\) scaling, so actual simulations should compute the cell count using the padded dimensions above rather than the asymptotic scaling rule alone.
One-line practical formula
For a uniform grid, if you know the model volume, minimum velocity, target frequency, and points per wavelength, collapse the spatial constant and work in km\(^3\):
\(10^9\) comes from \(1~\mathrm{km}^3 = 10^9~\mathrm{m}^3\), where \(V_{\mathrm{km}^3}\) is model volume in km\(^3\), \(v_{\min}\) is in m/s, and \(f_{\max}\) is in Hz.
Back-of-the-napkin size and padding correction shortcuts, assuming \(n_\lambda\approx 10\) and \(v_{\min}\approx 1500~\mathrm{m/s}\):
\[
N_{\mathrm{cell,true}}\approx (1.15\text{ to }1.30)N_{\mathrm{cell}}
\qquad\text{where}\qquad
N_{\mathrm{cell}}\approx 300\,V_{\mathrm{km}^3}f_{\max}^3
\]
\[
\text{so from } W_{\mathrm{prop}}\propto f_{\max}^4
\;\Rightarrow\;
W_{\mathrm{prop}}\propto N_{\mathrm{cell}}f_{\max}
\]
where \(v_{\max}\) is maximum propagation velocity, \(\Delta t\) is the time step, and \(\Delta x\) is spatial grid spacing.
Intuitively, it controls how small the time step must be relative to the grid spacing and maximum wave speed so that information does not travel farther than one grid cell per time step.
The numerical domain of dependence must contain the physical domain of dependence, and the stable CFL limit depends on dimension, stencil, and discretization.
If CFL forces a smaller \(\Delta t\), timesteps, FLOPs, and runtime all increase.
\[
\Delta t \le \frac{C_{\mathrm{CFL}}\Delta x}{v_{\max}}
\]
Production codes usually choose close to the limit for efficiency:
\[
\Delta t \approx \frac{C_{\mathrm{CFL}}\Delta x}{v_{\max}}
\]
Because the time step scales with grid spacing, \(\Delta t\propto \Delta x\). Since grid spacing decreases as maximum frequency increases, \(\Delta x\propto 1/f_{\max}\), so the number of time steps required for a fixed propagation time scales as \(N_t\propto f_{\max}\).
Section 6 — Number of Time Steps
\[
N_t=\frac{T_{\mathrm{prop}}}{\Delta t}
\]
Scaling:
\[
N_t\propto f_{\max}
\]
This is correct only because \(\Delta x\propto 1/f_{\max}\) and \(\Delta t\propto \Delta x\) from the CFL condition above.
To be specific, since the linear slope is non-trivial:
\(W_{\mathrm{prop}}\) counts total cell updates and is therefore dimensionless; the apparent time units in the constants cancel when combined with \(f_{\max}^4\).
These constants assume explicit time stepping and uniform grid spacing determined by minimum velocity and a fixed points-per-wavelength rule. Adaptive and anisotropic grids invalidate this simple constant-coefficient scaling model.
This multiplier depends on stencil order, physics, memory strategy, checkpointing method, imaging condition complexity, communication overhead, GPU occupancy, memory bandwidth behavior, and how your code defines “cell updates/s”.
Important: If \(P_{\mathrm{gpu}}\) already comes from an end-to-end RTM benchmark, do not apply a separate \(\alpha_{\mathrm{RTM}}\) multiplier or the cost will be double-counted.
Then:
\[
W_{\mathrm{rtm,shot}}=\alpha_{\mathrm{RTM}}N_{\mathrm{cell}}N_t
\qquad\text{if } P_{\mathrm{gpu}} \text{ is cell updates/s for one stencil propagation}
\]
where \(U\) is cluster utilization and \(H_{\mathrm{wall}}\) is campaign time target.
Because RTM shots are independent, total campaign compute is the sum of GPU-hours across all shots.
\(U\) is not a physics constant. It represents effective cluster utilization accounting for queueing delays, scheduling gaps, node failures, and workflow overhead. Typical planning values are 0.7–0.9.