Training Methodology & Dynamics#
The convergence of biologically-plausible Spiking Neural Networks (SNNs) and offline reinforcement learning presents unique optimization challenges. The SNN-DT achieves end-to-end learnability via Surrogate Gradients combined with structured dataset preparation.
1. Offline Dataset Curation#
Following the foundation of the Decision Transformer framework, we define a trajectory sequence parameterized by step returns, states, and action constraints: \(\tau = \{(s_t, a_t, r_t)\}^T_{t=1}\).
The temporal context is unrolled into autoregressive sequence intervals padded as required. Specifically, a return-to-go scalar \(G_t = \sum_{k=t}^{T} r_k\) fundamentally binds the sequence optimization trajectory, explicitly directing the modeled distribution towards highest-return outcomes by enforcing sequence prediction across \((G_1, s_1, a_1, \dots, G_N, s_N, a_N)\).
2. Leaky Integrate-and-Fire (LIF) Discretization#
The SNN-DT relies physically on the simulation of membrane charges. In discrete time frames dictated by integration parameter \(\Delta t\), a forward-Euler step gives the membrane potential:
When \(V[t+1] \geq V_{\text{th}}\), the neuron immediately broadcasts an absolute binary signal \(S_{t+1}=1\) and the internal potential is hard-reset towards \(V_{\text{rest}}\).
3. Surrogate Gradient Learning#
Due to the intrinsic, mathematically discontinuous constraint of the Heaviside step \(S = \{V \geq V_{\text{th}} \}\), standard computational graphs normally suffer vanishing gradients immediately.
SNN-DT utilizes a Fast-Sigmoid Surrogate Gradient to bypass the non-differentiable threshold. During the backward pass, we replace the Heaviside step’s derivative over the potential offset \(u = V - V_{\text{th}}\):
With \(k=10\), this enables a defined gradient flow backward through the dense attention channels all the way into the phase-shifted positional sine representations, supporting full end-to-end regression to target distribution values.