The need for dealing with sequential data

Recurrent neural networks

Long short-term memory (LSTM)

The encoder-decoder RNN architecture

Attention in encoder-decoder RNNs

This is the intuition of attention which is the underlying notion of the transformer architecture!