Looped Language Models

Ouro-1.4B: Matches the performance of 4B standard transformer models
Ouro-2.6B: Rivals 8B standard transformer models

Zhu et al. (2025) proposed a new architecture for LLMs where models can loop in the latent space without generating more tokens.

Source: https://arxiv.org/pdf/2510.25741

Since their model can learn the number of iterations to do, they also proposed a new training objective for it to train well!

Using their intuitions, they proposed two models:

Here are the performance comparisons:

They showed that looped LLMs are not better at knowledge capacity but they are better at knowledge manipulation!
The picture below shows comparison to other LLMs

Source: https://arxiv.org/pdf/2510.25741

There is a very intuitive video on Youtube!