The need for dealing with sequential data

Recurrent neural networks

This is the intuition of attention which is the underlying notion of the transformer architecture!