We recall that a transformer is usually composed of the following components:
- An input (with its associated embedding & encoding)
- A target (with its associated embedding & encoding)
- An output (word probabilities).
- A stack of encoders and decoders
In this page, we go in details into each of the components.
The embedding and positional Layer
Encoders and decoders
Skip connections
Self-attention VS Encoder-decoder attention
Multi-heads attention (and masking)
Generating the output
Previous Section
Home
Next Section