New AI Model Architecture “MoR” Aims to Succeed Transformers

Infographic comparing the Transformer model with the new MoR architecture, showing interconnected nodes and a robot head representing AI advancement

By Stuart Kerr, Technology Correspondent

Published: 03 August 2025
Last Updated: 03 August 2025
Contact: [email protected] | Twitter: @LiveAIWire
Author Bio: About Stuart Kerr

In a bold move that could redefine the backbone of artificial intelligence, Google DeepMind has unveiled a new model architecture known as the Mixture of Recursions, or MoR. Touted as a successor to Transformer models, MoR promises faster inference, reduced memory usage, and improved scalability—without compromising performance.

According to VentureBeat, MoR delivers up to 2× faster inference by restructuring how token dependencies are handled. Instead of processing inputs through fixed layers, MoR uses recursive feedback loops that adaptively allocate compute resources. This architecture introduces a paradigm shift for large language models, potentially lowering the cost of inference and training across the board.

The MoR-ViT technical paper goes further, showing how these recursive pathways can be applied to vision models as well. Compared to standard ViTs, the MoR-based alternative achieves the same accuracy with fewer parameters and substantially faster inference—making it attractive for deployment in real-time systems and edge devices.

A Transformative Break from Transformers

In Invisible Infrastructure, we explored how foundational model architecture choices shape the unseen pipelines of digital life. MoR's release is another such inflection point. Rather than stacking layers linearly, MoR dynamically routes computation based on context, introducing a form of algorithmic introspection.

Medium’s analysis describes it as a departure from Mixture of Experts (MoE) models, which also use selective routing but require vast ensemble sizes. MoR reduces this overhead while preserving adaptability. This balance may help prevent future compute bottlenecks, especially for startups and universities with limited hardware access.

Efficiency and Industry Implications

As reported by AIbase, internal DeepMind benchmarks reveal that MoR outperforms Transformers on most NLP tasks while slashing latency by 40%. For developers, this means faster inference at scale and more efficient multi-modal integration—crucial as models increasingly mix text, image, and audio inputs.

A related industry PDF dives into how MoR could alter the economics of data centers. With AI compute demand projected to surge, MoR's cost savings on power and hardware are not just a technical upgrade but a sustainability strategy. This echoes the concern raised in Algorithmic Hunger about AI's footprint in global infrastructure.

Meanwhile, the broader Compute at Scale report outlines how model architecture choices affect everything from chip design to cooling systems. MoR, with its reduced parameter count and modular design, offers a more hardware-aware path forward.

The End of Transformers?

While it's too soon to declare the end of Transformer dominance, MoR represents the strongest challenge yet. From AI Fights Disinformation to real-time language translation, MoR may soon touch every AI use case. Its capacity to deliver similar or superior outcomes with less energy and compute will appeal to both commercial and regulatory stakeholders.

In short, MoR may not just be a technical achievement—it could be a policy one too.

About the Author
Stuart Kerr is the Technology Correspondent for LiveAIWire. He writes about artificial intelligence, ethics, and how technology is reshaping everyday life. Read more

Liveaiwire