Simultrain Solution May 2026
In edge-cloud setting, data is at edge, compute is in cloud. The sequential round-trip time is:
[ \tilde\nabla_k = \nabla \ell(w^(e)_k; x_k) + \alpha \cdot (w^(c)_k - w^(e)_k) ] simultrain solution
[ w_t+1 = w_t - \eta \nabla \ell(w_t; x_t, y_t) ] In edge-cloud setting, data is at edge, compute is in cloud
SimulTrain sends activations (lower dimension than raw data but higher than gradients). However, it enables bidirectional overlap , reducing total bandwidth-time product by 65% compared to SyncSGD. | Dataset | Centralized | SyncSGD | FedAvg (5 local steps) | SimulTrain | |-------------|-------------|---------|------------------------|------------| | UCF-101 | 84.2% | 83.9% | 81.1% | 83.7% | | WISDM | 91.5% | 91.3% | 88.9% | 91.1% | | Dataset | Centralized | SyncSGD | FedAvg
of SimulTrain is that the forward pass of one batch and the backward pass of a previous batch can overlap in time, if we carefully manage parameter versions and gradients. This is analogous to CPU pipelining but applied to distributed training across heterogeneous compute nodes.
Proof sketch: The forecast term cancels first-order bias from staleness. Weight reconciliation prevents error accumulation. The pipeline yields the same effective gradient steps per unit time. Hardware: Edge = Raspberry Pi 4 (4GB RAM), Cloud = AWS g4dn.xlarge (NVIDIA T4). Network: emulated 4G (50 Mbps, 30 ms RTT) and 5G (300 Mbps, 10 ms RTT).