Gemini 3 Flash: Google's fast model that's winning the AI ​​race

  • Gemini 3 Flash becomes the default model in the Gemini app and in Google Search's AI Mode
  • It combines speed and advanced reasoning, outperforming Gemini 2.5 Pro in multiple benchmarks with lower latency.
  • It offers better performance than GPT-5.2 Extra High on various multimodal and multilingual tests, while maintaining a reasonable cost.
  • It is now available to users and developers in the EU through the app, the Gemini API, AI Studio and Vertex AI

Gemini 3 Flash model IA

Google's new generation of models takes an interesting turn with the arrival of Gemini 3 FlashA variant that seeks to break the idea that speed is always at odds with intelligence. This model debuts as the main option in the google ecosystem and is geared towards both everyday users and businesses and developers.

With this move, Google is trying to consolidate its position in the direct competition with OpenAI and Bard and other players in the sector, relying on a model that maintains much of the reasoning capabilities of Gemini 3 Pro, but with much faster responses, tighter resource consumption and a price that makes it attractive for massive deployments, also in Europe.

What is Gemini 3 Flash and how does it fit into the Gemini family?

Gemini 3 Flash is the lighter and faster version of the Gemini family 3Designed on the same technological foundation as Gemini 3 Pro and Gemini 3 Deep Think, but optimized to offer very low latency and lower costs without sacrificing advanced reasoning. In practice, it replaces Gemini 2.5 Flash as the benchmark fast model.

Google explains that this model is capable of modulate “how much he thinks” Depending on the task: you can dedicate more reasoning steps when the request is complex or reduce that effort for simple queries, resulting in a more efficient use of resources.

In real-world traffic, the company claims that Gemini 3 Flash It consumes around 30% fewer tokens. Gemini 2.5 Pro is designed to handle everyday tasks with high accuracy, which is relevant for those who rely on token volume billing in intensive applications.

In addition to speed, it maintains full capabilities of Multimodal AIIt can work with text, images, and video, analyze complex content, extract data, and answer demanding visual questions, making it a versatile option for multiple use cases.

Gemini 3 Flash AI Fast

Performance and benchmarks: how it compares to GPT-5.2 and other models

Benchmarks aren't everything, but they do offer a clear reference point for comparing models. In this area, data published by Google and external analyses place... Gemini 3 Flash in a very competitive position, especially striking considering it's a fast model.

In SimpleQA Verified, a test of verified knowledge questionsGemini 3 Flash achieves around 68,7%, well above the 38,0% of GPT-5.2 Extra High (the highest reasoning level in the GPT-5.2 family according to OpenAI's internal "xhigh" nomenclature). This difference positions it as a particularly strong option for factual knowledge queries.

In advanced multimodal reasoning (MMMU-Pro), Google's model achieves 81,2%, placing it above both GPT-5.2 Extra High and other cutting-edge models such as Claude Sonnet 4.5In Video-MMMU, geared towards video analysis, it also takes the lead with 86,9% compared to 85,9% for GPT-5.2 Extra High, reinforcing its profile for complex audiovisual tasks.

Multilingual and culturally sensitive assessments are also one of its strengths. In the Global PIQA, which measures the common sense in more than 100 languagesIt achieves a 92,8% compared to 91,2% for GPT-5.2 Extra High. Google emphasizes that Flash is especially optimized for capturing nuances outside of English, which is relevant for markets like Spain and the rest of Europe.

In the use of tools and agents, Gemini 3 Flash It again takes the lead in Toolathlon, with 49,4% compared to 46,3% for OpenAI's advanced model, and maintains a slight advantage in the FACTS Benchmark Suite, with 61,9% versus 61,4%. In other words, not only does it respond quicklybut it also shows consistency in workflows involving multiple tools.

Gemini 3 Flash benchmarks

Where it shines and where it lags behind “pure” reasoning

Despite these results, it's important to qualify the picture. In the tests most focused on extreme logical reasoning For high-level puzzles, GPT-5.2 Extra High continues to lead. For example, in ARC-AGI-2, a test focused on complex visual puzzles, the OpenAI model achieves 52,9% compared to Gemini 3 Flash's 33,6%.

In environments where edge code execution is critical, the difference is smaller, but it still exists. AIME 2025 With code execution, GPT-5.2 Extra High reaches 100%, while Gemini 3 Flash hovers around 99,7%, a small but significant difference. In SWE-bench Verified, designed for solving software engineering tasks, OpenAI's model scores 80,0% compared to Google's 78,0%.

Google's interpretation is that Flash is not intended to be the “absolute king” of pure reasoningbut offer a different balance: professional-level reasoning very close to that of the larger models, but with very low latencies and a much more manageable cost.

Another figure the company highlights is its performance on high-level knowledge tests, such as GPQA Diamondwhere it scores 90,4%, and Humanity's Last Exam, with 33,7% without tools. These results, according to Google, put it on par with larger frontier models, something unusual for a rapid variant.

In practice, for most day-to-day tasks and typical business use cases, These differences lie at the extremes of reasoning. They take a back seat to speed and cost efficiency, which is where Flash wants to make a difference.

Gemini 3 Flash on Google Search

Integration with Google Search and the Gemini app

One of the most visible changes for users is that Gemini 3 Flash becomes the default engine This applies to both Google Search's AI mode and the Gemini app itself, on both desktop and mobile. In other words, when someone activates or uses AI search mode in Google, this model will be the one responding in the background in most cases.

In search results, this translates to more elaborate answers and fast when faced with long queries or queries with multiple conditions. Google gives as an example complex requests such as "night plans in a city for parents with young children", where the model must take into account several nuances at once and offer reasoned results.

The company claims that AI Mode with Gemini 3 Flash is more powerful for capturing nuances For each query, combine real-time information (including local data) and present visually easier-to-digest answers, with structured summaries and relevant links.

In Spain and the rest of Europe, the rollout of AI Mode is being done gradually, also conditioned by the data protection regulations and the requirements of the European regulatory framework. Even so, Google has made it clear that the The intention is to bring Flash to as many markets as possible., maintaining regional adjustments where necessary.

In the Gemini app, the change is also evident: when you open the model selector The Gemini 3 family appears with three main options: “Quick” (Gemini 3 Flash), “Think” (a mode geared towards complex problems), and “Pro” (for advanced programming and math tasks). For most conversations, the Quick option will be selected by default.

Gemini 3 Flash app and mobile

Changes in the mobile experience and everyday use

Aside from the numbers, one of the areas where Gemini 3 Flash really shines is in the feeling of immediacy When using the Gemini app or AI Mode on your mobile device, the model responds with significantly less latency, reducing waiting time even for complex queries.

Google has also adjusted how AI interacts with screen content on Android. Previously, it was necessary to press a button toshare screen"With Gemini; now, you just have to say something like 'explain this to me' for the assistant to directly analyze what you are seeing and offer a contextual response, something that is already starting to be seen on some devices in Spain."

Enhanced multimodal capabilities make it possible to upload videos, images, or large documents and ask Gemini 3 Flash for summaries, data extraction, or detailed explanations. You can even analyze videos in real time while they're playing, without waiting for them to finish.

In the entertainment sector, Google is focusing on uses such as video games with low latency and non-player characters who can converse coherently and without noticeable delays thanks to the low latency. This type of experience is especially sensitive to any lag, so a fast model is key.

For the average user, beyond the metrics, what will be noticeable is that AI “she’s more relaxed”It responds faster, maintains the flow of the conversation better, and handles longer requests without the wait becoming tedious. The perception of fluency is often a determining factor in people adopting these tools in their daily routine.

Availability in Spain, Europe and access for developers

Google is rolling out Gemini 3 Flash globally and presenting it as a model without country-specific restrictions Regarding basic access from the Gemini app, this means that users in Spain and the European Union can now use it as the default model in both the app and the web version, provided the Gemini service is available in their region.

Furthermore, the Flash line has long been the most used by developersThe Gemini 2 and 2.5 Flash models already processed trillions of tokens in hundreds of thousands of applications; with Gemini 3 Flash, Google wants to eliminate the need to choose between speed and intelligence when designing new products.

For development environments, the model is being deployed via the Gemini API in Google AI Studio, the Gemini CLI, Android Studio, and other tools such as Google Antigravity. At the enterprise level, customers can access Gemini 3 Flash through Vertex A.I, the Google Cloud enterprise platform.

This approach points to a scenario in which Google's AI is integrated “in every corner of everyday digital life”From mobile and web applications to internal work tools, including customer service systems, assistants on e-commerce platforms, and specific solutions for the public sector in Europe.

In the specific case of Spain, Google has highlighted that the country is among the markets where Gemini's rollout is going especially fastso that local users and businesses can take advantage of these new capabilities sooner, always within the limits set by European regulation.

Pricing, token efficiency, and enterprise use

The Gemini 3 Flash isn't the cheapest model in Google's catalog, but it positions itself as one of the more cost-effectiveFor those who integrate the API into their services, the announced price is $0,50 per million input tokens and $3 per million output tokens, with an audio input set at $1 per million tokens.

Compared to Gemini 2.5 Flash, there is a slight increase in fees (previously $0,30 per million tokens inflow and $2,50 per million outflow), although Google maintains that the model compensates with its greater efficiency: it uses around 30% fewer tokens than Gemini 2.5 Pro in typical flows, which can represent a overall savings on billing depending on the type of use.

This combination of speed and reasonable cost makes Flash especially geared towards high-frequency workflows: in-app assistants, customer support systems, business process automation, big data analytics, or creative applications that require many API calls.

For many European companies considering which model to choose, the top-line performance Flash's offering is attractive: it doesn't reach the absolute maximums of reasoning as Pro or some rivals, but it offers top-tier performance in most relevant metrics at an affordable cost.

Meanwhile, Google continues to offer Gemini 3 Pro as an option for demanding tasks advanced interactive toolscomplex visualizations or high-quality image generation, giving some leeway for each organization to combine models according to the sensitivity of its use cases.

With the arrival of Gemini 3 Flash, Google reinforces its strategy of bringing next-generation intelligence to more people and more products, betting on a model that demonstrates that Speed ​​and scalability don't have to be at odds with good reasoning.For users and businesses in Spain and Europe, the result is a more present AI in search engines, mobile devices, and work tools, with significantly reduced response times, improved multimodal capabilities, and a cost that allows for large-scale deployments without losing sight of the regulatory and trust requirements demanded by the European market.

Google's AI Mode arrives in Spain
Related article:
Google's AI Mode arrives in Spain: how it works and what it changes in search