Today, we’re thrilled to announce that Gemini 3.1 Flash-Lite, our fastest and most cost-efficient Gemini 3 series model yet, is now generally available.
Designed for ultra-low latency, high-volume tasks, and unmatched cost-efficiency, Flash-Lite is already transforming how applications are built at scale. Fast, iterative, and scalable, it joins our comprehensive suite of Pro and Flash models to provide the exact combination of intelligence, speed, and cost required for the most demanding production deployments.
Developers and enterprises have noted that the model provides the precision required for agentic tasks like tool calling and orchestration, coupled with the cost-efficiency needed to run automated pipelines at scale.
Here’s a look at how some of them have been driving value.
Software development and engineering
Engineering teams require models that can keep pace with real-time coding environments. With the GA of Gemini 3.1 Flash-Lite, developers are unlocking the instant responsiveness necessary for complex code completion, seamless UX design, and agentic developer tools.

“Integrating Gemini 3.1 Flash-Lite has transformed the responsiveness of our IDE AI assistant & Junie agent. The balance of high intelligence and minimal latency makes it the perfect model for real-time developer support.” — Vladislav Tankov, Director of AI at JetBrains
Customer experience and high-volume service
For enterprise customer service operations, handling massive volumes of interactions requires models that can scale affordably without sacrificing reasoning capabilities.

Gladly runs customer service for some of the most demanding retail brands in the world. The core of its text-channel AI agent runs on Flash-Lite. By handling millions of customer-facing calls each week across channels like SMS, WhatsApp, and Instagram, they achieved roughly 60% lower costs than comparable thinking-tier models on the same token mix.
The model powers every step of the agent lifecycle — from selecting tools and classifying playbooks to deciding when to escalate to a human — all while maintaining a p95 latency around 1.8s seconds for fully reply generation and sub-second p95 for classifiers and tool calls, alongside a ~99.6% success rate under heavy concurrent load.
Creative pipelines and gaming
In the fast-paced creative and gaming industries, multimodal capabilities and ultra-low latency are essential for keeping users engaged and content pipelines flowing. Flash-Lite is empowering platforms to process rich media and generate hyper-personalized environments.

Astrocade lets anyone create games by describing what they want in natural language. They integrated Flash-Lite to serve a rapidly growing global user base.
For every incoming game request, it performs a multimodal safety check — analyzing both text and images — before the building agents even start their work. It further supports their global community through inline comment translation, allowing players in different countries to “riff” on the same game. And as part of their asset generation pipeline, it helps refine the final prompts to ensure consistently high-quality thumbnails.

The creative platform krea.ai has also seen positive results by using Flash-Lite as a prompt enhancer in their Nodes tool. By taking a user’s rough idea and expanding it into a full image generation prompt pipeline, the model provides a level of detail that is “weirdly creative” for its price point.
These outputs move the needle on image production, providing a level of reliability and scale that was previously cost-prohibitive for sophisticated prompt engineering.
Financial services and data operations
In the world of finance and enterprise product development, efficiency is just as critical as accuracy. Gemini 3.1 Flash-Lite gives financial analysts and product managers the ideal balance of intelligence, low latency, and cost-effectiveness to run modeling and latency-sensitive applications.

OffDeal uses Flash-Lite to power “Archie,” an AI agent that investment bankers use for real-time research, data lookups, and task execution during Zoom calls. In these scenarios, bankers often need to surface financials mid-conversation. OffDeal found that Flash-Lite was the only model capable of meeting the response times needed for genuinely instant answers without forcing a tradeoff on quality.
Beyond live calls, they also use Flash-Lite as a triage layer for inbound and outbound email traffic. By answering structured questions about messages in parallel, such as whether an email is an automated response or in relation to an active deal, Flash-Lite determines which downstream AI agents get invoked and with what context.

For high-volume, latency-sensitive workflows on the financial operations platform Ramp, Flash-Lite has become a key component:
“Gemini is a core part of the model stack we use across applications at Ramp. As indicated in our benchmarks, we see Gemini lead the pareto fronts in terms of costs, latency and intelligence—providing a great tradeoff between the three and making it well-suited for latency sensitive applications. Gemini 3.1 Flash-Lite has been especially valuable, powering many of our highest-volume, latency-sensitive features without compromising on quality.” – Anton Biryukov, Applied AI Engineer, Ramp

Market intelligence platform AlphaSense integrates Flash-Lite to deliver data insights:
“Gemini 3.1 Flash-Lite provides great balance of speed, cost and performance, allowing AlphaSense to scale our advanced data processing and deliver high-quality intelligence across every layer of our data stack”– Chris Ackerson, Senior Vice President of Product, AlphaSense
Get started
Read the docs for Gemini 3.1 Flash-Lite and learn about our latest pricing structure. Learn more about the Gemini Enterprise Agent Platform, the new standard for enterprise agent development.