Transformers are slow for inference

The State Space Model (SSM)

Discretization

Structured State Spaces for Sequences (S4) - Gu et al (2021)

Mamba: Linear-Time Sequence Modeling with Selective State Spaces (S6) - Gu and Dao (2024)

Transformers are Better than State Space Models at Copying - Jelassi et al. (2024)

Conclusion