AI coding agents are rapidly becoming ubiquitous across the software industry, fundamentally changing how developers write, test, and debug daily code. While these tools excel at localized, self-contained tasks, applying them to massive, systemic codebase migrations requires an entirely new approach.
Google is already addressing this challenge by incorporating AI into many migration workflows: x86 to ARM (enabling workloads on Google Axion processors); int32 to int64 identifiers (to avoid running out of ids); JUnit3 to JUnit4 (for testing); and Joda-Time to java.time (a modern time library). However, AI model migration represents a whole new level of complexity that requires even more advanced methods for AI-assisted migration.
Translating a production-grade machine learning model from one framework to another, for example, from TensorFlow (TF) to JAX, is not a simple syntax update. It is a long-horizon task that requires untangling thousands of lines of code, managing complex states across multiple files, and preserving precise mathematical equivalence. Generic, single-agent coding assistants typically struggle under this weight — they frequently lose context over long workflows, hallucinate APIs, or fail to produce buildable code across an entire repository.
Google’s AI and Infrastructure team has pioneered a new approach to this industry-wide problem. The result is 6x faster model migration, a milestone Sundar highlighted in the recent Google Cloud Next keynote. In this post, we share how we deployed specialized, multi-agent AI systems to migrate some of Google’s largest-scale production models from TF to JAX.
Accelerating the transition from TF to JAX
For many teams at Google — and across the industry — the future of scalable machine learning is being built on JAX. Designed around a functional, stateless paradigm, JAX is heavily optimized for modern Tensor Processing Unit (TPU) infrastructure and XLA compilation, making it the bedrock of the modern AI stack.
Evolving to this future presents a monumental challenge. Thousands of production models are built on TensorFlow, a framework characterized by object-oriented, stateful layer initialization and static execution graphs. Manually migrating these models to JAX requires a fundamental rethinking of how layers interact, and how state is explicitly managed. Across large organizations, this type of migration alone represents hundreds (if not thousands) of software engineering (SWE) years — time better spent on researching new architectures and driving product innovation.
Overcoming this challenge with AI started as an ambitious experiment within Google’s AI and Infrastructure team, but has evolved into a repeatable blueprint for addressing complex engineering problems across the company.
Moving beyond single-agent coding
Our early experiments with agentic code translation showed promise for simple models. However, when faced with the realities of a Google-scale migration — complex, production-grade models spanning multiple files and thousands of lines of code — generic, single-agent setups struggled. They could not balance high-level structural rules with low-level execution details, resulting in a variety of failures, such as overwriting critical files or skipping necessary functionality. To overcome these common challenges inherent to enterprise migrations, we developed a highly specialized multi-agent architecture that consists of:
-
The Planner agent: Using deterministic, compiler-based static analysis, the Planner maps out the codebase’s entire dependency tree. It then works alongside other agents to break the migration down into a discrete, step-by-step plan, helping ensure the migration happens logically from the “leaf nodes” (layers without unmigrated dependencies) upward.
-
The Orchestrator agent: This agent acts as the project manager. It dynamically groups plan steps into manageable chunks to keep the context window focused, injects the necessary domain knowledge, and handles failure recovery if a step doesn’t build.
- The Coder agent: Built as a reasoning and acting agent, the Coder is the workhorse. Integrated directly into our internal IDE tools, it has the ability to read files, write code, run builds, and execute unit tests. Crucially, it operates in a “test-and-fix” loop, self-correcting until it produces a compilable, verifiable component in the target language.
Figure: Multi-agent AI system for complex code migrations. Process diagram describing the multi-agent system used to migrate legacy model code to JAX. Image generated with Gemini Nano Banana 2.
Scalable validation and dynamic Playbooks
Generative AI models are only as good as the context they are provided. Because source and target architectures rarely map 1-to-1, we engineered a scalable, hierarchical system of Playbooks.
These Playbooks range from general repository instructions to highly specific “golden examples” distilled from successful manual migrations. By feeding the Orchestrator a client-specific Playbook (for instance, one tailored to YouTube’s unique ranking model infrastructure), the system avoids generic hallucinations and strictly adheres to internal coding standards. This Playbook architecture is framework-agnostic, meaning it can be adapted to guide migrations between any two programming languages or frameworks.
Furthermore, we instituted rigorous quality metrics to ensure the generated code is actually production-ready:
-
Quantitative verification: For each unit of code, we verify correctness mathematically. In the case of the TF-to-JAX migration, the system utilizes algorithmic gradient ascent to find the maximum error between the original TF layer and the new JAX layer, mathematically verifying functional equivalence.
-
Qualitative evaluation: We also evaluate the migrated code against a set of qualitative standards. In the case of the TF-to-JAX migration, we deploy a blind-audit LLM Judge that scores the migrated code against a framework-agnostic architectural checklist, so that critical, domain-specific logic is completely captured.
Redefining migration velocity
By deploying this multi-agent system, we dramatically alter the economics of software migration.
In our evaluations on real-world, highly complex YouTube models (featuring thousands of lines of code, hundreds of layers, and deep metric dependencies), the multi-agent system achieved a 6.4x to 8x speedup over performing the migration manually. What traditionally took several SWE-months can now be reduced to only a few weeks of AI-assisted code generation, followed by expert human review.
The system effectively handles the boilerplate, identifies target idioms, maps the dependencies, and generates the unit tests, allowing engineers to act as reviewers and architects rather than manual translators.
Looking ahead into the AI-assisted era
AI is transforming the pace of technological innovation. Without using AI to accelerate our ability to conduct large-scale migrations, it will become increasingly difficult for organizations to adopt the latest breakthroughs and maintain the security, reliability, and performance of their systems.
Our work migrating machine learning implementations from one ML framework to another demonstrates that by combining deterministic static analysis, strict testing loops, and specialized multi-agent architectures, we can safely automate some of the most complex software engineering challenges in the industry. A detailed description of the process is published in our technical paper.
This work is the result of collaboration across Google. We thank key contributors: Stoyan Nikolov, Niyati Parameswaran, Bernhard Konrad, Moritz Gronbach, Niket Kumar, Ann Yan, Varun Singh, Yaning Liang, Antoine Baudoux, Xevi Miró Bruix, Daniele Codecasa, Madhura Dudhgaonkar, Elian Dumitru, Alex Ivanov, Christopher Milne-O’Grady, Ahmed Omran, Ivan Petrychenko, Assaf Raman, Stefan Schnabl, Yurun Shen, Maxim Tabachnyk, Niranjan Tulpule, Amin Vahdat, and Jeff Zhou.