The need for dealing with sequential data
Recurrent neural networks
Long short-term memory (LSTM)
The encoder-decoder RNN architecture
Attention in encoder-decoder RNNs
This is the intuition of attention which is the underlying notion of the transformer architecture!