Google DeepMind has introduced Genie 3, a new generation of AI world models designed to simulate rich, interactive 3D environments from simple text prompts.
Highlights
- Genie 3 generates interactive 3D environments from simple text prompts in real-time (24 FPS, 720p), lasting several minutes — a leap from static simulations.
- “Promptable world events” allow users to alter environments dynamically with text — like using a game engine without code.
- Emergent memory enables temporal consistency and physical realism — the model looks back to determine what comes next.
- Genie 3 environments are training grounds for AI agents to explore, learn, and adapt through trial and error, much like human learning.
- No physics engine needed: Genie 3 learns physics by watching, not coding — using real-world video data to create believable simulations.
- Built from 200,000+ hours of gameplay and robotics footage, Genie evolved from 2D foundations to immersive 3D interaction.
- World models like Genie 3 are considered essential to AGI, enabling AI to form internal predictive models of their environments.
- DeepMind’s world modeling team, formed in 2025, is scaling this work for use across Gemini, Veo, and other Google AI projects.
- Potential real-world uses extend beyond gaming: from robotics to autonomous systems and industrial training simulations.
Touted by researchers as the first real-time, general-purpose interactive world model, Genie 3 marks a significant evolution in AI’s ability to learn through virtual environments—paving a possible path toward artificial general intelligence.
Open-Ended, Real-Time Worlds
Unlike earlier models limited by static rules or pre-defined environments, Genie 3 enables freeform world generation — from photorealistic scenes to imaginative environments — all running at 24 frames per second in 720p, with simulations now lasting several minutes instead of seconds.
This expansion builds on the capabilities of Genie 2, which allowed AI agents to interact with dynamic virtual worlds, and integrates physics-aware reasoning from Veo 3, DeepMind’s advanced video generation system.
Promptable World Events and Real-Time Interaction
One of Genie 3’s core features is its ability to dynamically modify environments using text-based prompts — a concept researchers describe as promptable world events.
Although still in research preview and not publicly available, this functionality points to a future where users can create and interact with digital worlds in real time, similar to manipulating scenes in a game engine — but without the need for code or manual design.
Emergent Memory and Temporal Consistency
An unexpected and promising feature of Genie 3 is its emergent memory system. Though not explicitly programmed with memory, the model appears to retain prior frame data in order to maintain physical and temporal consistency.
This allows for more coherent simulations where objects move, fall, and interact over time in a believable way. “It has to look back at what was generated before to decide what’s going to happen next,” explained Shlomi Fruchter, Research Director at DeepMind.
From Simulations to Learning Environments
More than just visual showcases, Genie 3’s environments serve as training grounds for AI agents. These agents can engage in open-ended exploration, problem-solving, and trial-and-error learning — closely resembling how humans learn through real-world interaction.
While current agent capabilities remain relatively simple, researchers view Genie 3 as a foundational tool for training future AI systems capable of planning, adapting, and reasoning within complex environments.
No Physics Engine Required
Unlike traditional simulations that rely on fixed physics engines, Genie 3 learns physics implicitly — by observing how objects behave over time across training data.
This allows it to generate environments that are both physically plausible and adaptable, potentially enabling more flexible AI agents that can reason over longer durations and generalize to novel situations.
From 2D Gameplay to Immersive 3D Worlds
The original Genie model was trained on over 200,000 hours of gameplay and robotics videos, primarily in 2D formats.
Despite lacking explicit labels, it learned fundamental physics and action dynamics through spatiotemporal transformers and a latent action model. Genie 3 builds on this architecture at a far larger scale, bringing those learnings into immersive, interactive 3D environments.
Why World Models Are Considered Core to AGI
DeepMind and other AI researchers argue that world models are essential for developing general intelligence.
A recent study showed that agents capable of generalizing across tasks must inherently learn predictive models of their environment. In this framework, simulation is not just a tool — it’s a prerequisite for building adaptable, embodied AI systems.
DeepMind’s Simulation Strategy
In early 2025, DeepMind formed a dedicated world modeling team led by Tim Brooks, previously with OpenAI’s Sora project. This new group is focused on scaling simulation training with multimodal data and integrating it with larger systems like Gemini and Veo.
The initiative reflects Google’s broader strategic focus on embodied AI, using simulation as a cornerstone for training next-gen agents.
Beyond Gaming
Although Genie 3 has clear potential in gaming and virtual environments, its broader use cases include training real-world agents like robots, autonomous vehicles, and warehouse systems.
By simulating complex environments such as factories or ski slopes, Genie 3 could accelerate learning while minimizing physical risks and reducing training costs.