In January 2025, DeepSeek launched DeepSeek-R1, their first large reasoning model. It sent "shock waves" through the industry due to its open-source, cost-effective, and high-performing AI models.

This small summary was inspired by the research paper by Guo et al. (2025).

What is DeepSeek-R1? Why did Nvidia’s share price dropped by US$600 billion?

First, creating DeepSeek-R1-Zero

First, they used a previous LLM (DeepSeek-V3-Base, released in December 2024) to initialise DeepSeek-R1-Zero.

Group Relative Policy Optimisation

Then, they used a reinforcement learning technique called Group Relative Policy Optimisation to train DeepSeek-R1-Zero using a reward function.

The reward function used were rule-based: