Now that you are familiar with how strings of text are tokenized and transformed into embeddings, and the multi-head mechanism; we can delve into the GPT-2 model.
A GPT-2 model is composed of several elements:
In this practical, we explain and implement all of these elements.
Let us consider 2 training examples with 5 dimensions (features) each.
torch.manual_seed(123)
batch_example = torch.randn(2, 5)
<aside> ππ»
Expected output
tensor([[-0.1115, 0.1204, -0.3696, -0.2404, -1.1969],
[ 0.2093, -0.9724, -0.7550, 0.3239, -0.1085]])
</aside>
Pass the batch above into a linear layer with output dimension 6 and ReLU
activation. Print the output.
<aside> ππ»
Expected output
tensor([[0.2260, 0.3470, 0.0000, 0.2216, 0.0000, 0.0000],
[0.2133, 0.2394, 0.0000, 0.5198, 0.3297, 0.0000]],
</aside>