Transformers are slow for inference
The State Space Model (SSM)
Discretization
Structured State Spaces for Sequences (S4) - Gu et al (2021)
Mamba: Linear-Time Sequence Modeling with Selective State Spaces (S6) - Gu and Dao (2024)
Transformers are Better than State Space Models at Copying - Jelassi et al. (2024)
Conclusion