Zhu et al. (2025) proposed a new architecture for LLMs where models can loop in the latent space without generating more tokens.

Source: https://arxiv.org/pdf/2510.25741
Since their model can learn the number of iterations to do, they also proposed a new training objective for it to train well!
The paper is available online: https://arxiv.org/abs/2510.25741
Using their intuitions, they proposed two models:
Here are the performance comparisons:

Source: https://arxiv.org/pdf/2510.25741
There is a very intuitive video on Youtube!