Tokyo-based research startup Sakana AI has open-sourced a novel algorithm designed to enable multiple AI models to work together on complex reasoning tasks.
Highlights
- AB-MCTS Algorithm: Adaptive Branching Monte Carlo Tree Search allows multiple AI models to collaborate during inference by dynamically choosing which model handles each step based on context and capability.
- Intelligent Model Switching: Uses Thompson Sampling to assign specific reasoning tasks to the best-suited model, enabling deeper or broader thinking as needed.
- Collaborative Gains: In ARC-AGI-2 benchmarks, model combinations using AB-MCTS outperformed individual models, solving 27.5% of tasks compared to 23% for o4-mini alone.
- Open Source Toolkit: Released under Apache 2.0 license, the TreeQuest toolkit includes full AB-MCTS implementation, model adapters, and benchmark scripts on GitHub.
- Evolutionary Roots: Builds on Sakana AI’s 2024 work in evolutionary model merging—shifting from model “creation” at training time to model “coordination” at runtime.
- Real-Time Efficiency: Enables smaller and mid-sized models to outperform larger ones through division of cognitive labor, boosting both accuracy and computational efficiency.
The method, called Adaptive Branching Monte Carlo Tree Search (AB-MCTS), offers a new approach to collaborative inference by dynamically selecting not only how to reason—deeper or broader—but also which model is best suited for each step of the problem.
AB-MCTS
Unlike traditional ensemble methods that rely on fixed voting mechanisms or average outputs, AB-MCTS selects from a pool of AI models at inference time, directing specific sub-tasks to the most suitable model based on its strengths.
This allows for real-time collaboration between models such as Gemini 2.5 Pro, o4-mini, and DeepSeek-R1, with the goal of enhancing performance, improving decision diversity, and optimizing resource usage.
The algorithm builds on Monte Carlo Tree Search (MCTS), long used in AI planning, by adding two key innovations:
- Adaptive Depth and Breadth Reasoning: AB-MCTS chooses whether to “think deeper” (refine current outputs) or “think wider” (explore new possibilities).
- Model-Level Selection: A Bayesian sampling strategy (specifically Thompson Sampling) determines which AI model to use at each decision branch, allowing for strategic model switching and task assignment.
Performance on ARC-AGI-2 Benchmark
The algorithm was evaluated using the ARC-AGI-2 benchmark, which tests complex reasoning across a variety of abstract tasks. In one test:
- o4-mini alone solved 23% of the tasks.
- When combined with Gemini 2.5 Pro and R1-0528 via AB-MCTS, the system solved 27.5%, showcasing the benefits of distributed cognitive load and collaborative inference—even without scaling to massive model sizes.
This performance demonstrates that intelligently combining smaller or mid-sized models can outperform single, larger models in certain scenarios, especially where interpretability, adaptability, and computational efficiency are priorities.
Open-Source Release and Toolkit
Sakana AI has made AB-MCTS fully open source under the Apache 2.0 license, along with its associated tools:
- TreeQuest Toolkit: A complete implementation of AB-MCTS and its multi-LLM extension.
- Benchmark Scripts: Reproducible code for ARC-AGI-2 experiments.
- Model Configuration Files: For integrating different language models into the AB-MCTS framework.
Developers and researchers can access the codebase via Sakana AI’s GitHub repository.
Building on Evolutionary Model Merging
AB-MCTS represents a practical extension of Sakana AI’s earlier work on evolutionary model merging, a technique introduced in 2024 that explored combining model capabilities to create novel behaviors.
While that work focused on training-time integration (“mixing to create”), AB-MCTS brings the concept to inference time (“mixing to use”), allowing dynamic orchestration of models as if they were a team of specialists.
Features at a Glance
1. Real-Time Model Selection
Each reasoning step is assigned to the most appropriate model, optimizing both performance and compute usage.
2. Multi-Directional Search
Supports both refinement and exploration within a flexible search tree structure.
3. High Benchmark Efficiency
Outperforms single-model baselines on ARC-AGI-2, especially in nuanced reasoning tasks.
4. Full Open Source Access
Includes the TreeQuest implementation, model adapters, and full experiment documentation.
5. Foundation for Collective Intelligence
Suggests a paradigm shift from monolithic LLMs to collaborative AI teams working in tandem.
Sakana AI’s approach challenges the idea of “one model to rule them all.” Instead, it proposes a future where different models, each with distinct capabilities, contribute collaboratively—similar to how human teams divide labor based on expertise.