3.2 Inside a Transformer Model | Notion

We recall that a transformer is usually composed of the following components:

An input (with its associated embedding & encoding)
A target (with its associated embedding & encoding)
An output (word probabilities).
A stack of encoders and decoders

In this page, we go in details into each of the components.

The embedding and positional Layer

Encoders and decoders

Skip connections

Self-attention VS Encoder-decoder attention

Multi-heads attention (and masking)

Generating the output

Previous Section

Next Section