The first week of December 2025 has felt like an AI season finale.
OpenAI is in an official “code red,” racing to ship a new GPT‑5.2 reasoning model. Elon Musk’s xAI is boasting that its Grok 4.20 bot just beat every other frontier model in a real‑money trading league. And Google DeepMind is rolling out Gemini 3 Deep Think while scrambling to fix bias problems in its Nano Banana Pro image model.
Here’s a detailed rundown of what actually happened between 1–7 December 2025, and what it signals about the current AI model race.
1. OpenAI: From “Code Red” to GPT‑5.2 and the Secretive “Garlic” Model
1.1 The “code red” memo and a refocus on ChatGPT’s core model
On 2 December 2025, OpenAI CEO Sam Altman sent staff an internal memo declaring a “code red” for ChatGPT. Major outlets including AP, The Guardian and The Wall Street Journal report that the company is pausing or slowing projects like advertising tests, shopping agents and other side initiatives to concentrate on core model quality. [1]
According to those reports, Altman’s priorities are:
- Faster responses and better reliability
- Stronger personalization
- A wider range of questions ChatGPT can safely and accurately answer
Behind the scenes, this is widely seen as a direct response to Google’s Gemini 3 models, which have impressed even OpenAI insiders and raised concerns that Google is closing the performance gap. [2]
A developer memo on OpenAI’s own forum even warned builders to “expect various errors during December 2025” as internal “construction” ramps up — essentially telling the ecosystem that stability might be bumpy while they retool systems at speed. [3]
1.2 GPT‑5.2: OpenAI’s first big move after the panic button
Multiple reports this week say the first tangible outcome of “code red” will be GPT‑5.2, a new reasoning‑focused model:
- The Verge reports that OpenAI is accelerating GPT‑5.2’s launch to around 9 December 2025, moving it up from later in the month. [4]
- The Times of India, citing the same underlying reporting, describes GPT‑5.2 as OpenAI’s first formal response to Gemini 3, claiming internal benchmarks where it now edges out Google’s model. [5]
- Ad‑tech–focused coverage notes that OpenAI has delayed its planned advertising experiments specifically so engineers can focus on this new reasoning model instead. [6]
OpenAI hasn’t officially published a technical card for GPT‑5.2 yet, but the reporting paints a consistent picture:
- It’s a frontier‑scale reasoning upgrade, not a small patch
- It’s designed to slot directly into ChatGPT (and likely the API)
- Internal tests reportedly show it ahead of Gemini 3 on some reasoning benchmarks, which is exactly the gap Google had just claimed in its favor
For users and developers, the message is clear: the next ChatGPT upgrade is not “just a quality‑of‑life update” — it’s a head‑to‑head model race against Gemini 3.
1.3 “Garlic”: the specialist LLM behind the scenes
There’s also a more mysterious character in this week’s OpenAI story: a codename “Garlic.”
- A paywalled report in The Information says OpenAI is developing a large language model internally nicknamed Garlic, positioned as a specialist model aimed at domains like biomedicine and healthcare rather than a general‑purpose chatbot. [7]
- A follow‑up piece in eWeek describes Garlic as being rushed alongside new ChatGPT features as part of the company’s broader response to Gemini 3 and Anthropic. [8]
According to those reports and summaries:
- Garlic is tuned for complex reasoning and coding, and tested against Gemini 3 and Anthropic’s latest Claude variants
- It may eventually surface under a different public name — potentially as a GPT‑5.x release in 2026
- Internally, it’s part of a shift towards domain‑optimized models that can justify premium pricing in regulated industries like health, finance and law
Nothing about Garlic is official yet, so it’s best treated as credible but unconfirmed — a reminder that not all of OpenAI’s model work shows up immediately in public products.
1.4 Training stack upgrades: Neptune and Thrive Holdings
OpenAI also made two big infrastructure and enterprise moves this week that directly affect how it trains and deploys its models:
- Acquisition of Neptune (Dec 4)
- Reuters reports that OpenAI agreed to acquire Neptune, a Poland‑born startup whose tools track and visualize AI training runs at scale. [9]
- OpenAI already relied heavily on Neptune to monitor training of large models like GPT; buying the company outright gives it deeper visibility into model behavior, faster iteration cycles and better debugging for safety issues like hallucinations or “reward hacking.” TechStock²
- Equity stake in Thrive Holdings (Dec 1)
- Another Reuters piece details OpenAI’s investment in Thrive Holdings, a roll‑up for professional‑services firms like accounting and IT providers. [10]
- The plan is to embed OpenAI researchers inside Thrive companies to co‑design domain‑specific tools, effectively making real‑world enterprise workflows a training ground for future models.
Taken together, these moves show OpenAI trying to own more of its training stack and to couple its frontier models tightly to high‑value, industry‑specific data.
1.5 Safety, philanthropy and model honesty
OpenAI also tried to balance the arms‑race narrative with safety and social‑impact news:
- On 3 December, OpenAI published “How confessions can keep language models honest”, a research note on training models to explicitly “confess” when they break instructions or take shortcuts, even if the final answer looks correct. [11]
- External coverage explains that the method rewards models for admitting they guessed, cheated or violated guidelines in a separate “confession” step, rather than only scoring the original answer. [12]
The OpenAI Foundation also announced the first wave of its People‑First AI Fund grantees:
- AP reports that 208 nonprofits with modest annual budgets will share $40.5 million in unrestricted funds, with projects ranging from youth orchestras to local journalism initiatives. [13]
This sits awkwardly but importantly beside the “code red” drama: OpenAI is simultaneously:
- Stress‑testing ChatGPT for speed and accuracy
- Racing to ship GPT‑5.2 and, eventually, Garlic
- Funding community‑level work on AI literacy, opportunity and resilience
The subtext of the week: “Yes, we’re in a capabilities arms race — and we’re trying to prove we’re also responsible.”
2. xAI: Grok 4.20 Turns into a Real‑Money Trading “Mystery Model”
While OpenAI scrambled to upgrade ChatGPT, xAI spent the week bragging about a new twist on its Grok model family: Grok 4.20.
2.1 Grok 4.1: the baseline
A recent investor‑focused analysis describes Grok 4.1 — xAI’s current flagship — as a massive, 2.7‑trillion‑parameter multimodal model with a 256,000‑token context window, real‑time web access, and support for text, images and audio. [14]
That combination puts Grok in the same class as OpenAI’s GPT‑5.1 and Google’s Gemini 3 Pro: frontier‑scale, long‑context, multi‑modal and tuned for “agentic” workflows.
2.2 Grok 4.20 in Alpha Arena: trading league champion
This week, the buzz shifted to a variant dubbed Grok 4.20 (or Grok 4.2), tested in a live‑money stock‑trading competition called Alpha Arena:
- Tech and finance blogs report that 32 AI systems — various LLMs and prompting strategies — were each given $10,000 in real funds to trade autonomously on the Nasdaq over a defined period. [15]
- According to these reports, Grok 4.20 finished at the top of the leaderboard, with some coverage citing roughly 47% gains versus losses or weaker performance for rivals, including models based on GPT‑5, Gemini, Claude and others. [16]
- A deeper breakdown from Geeky Gadgets says Grok 4.20 delivered an aggregate return of 12.11% over two weeks, with peak gains touching 50% in certain configurations — still far ahead of most competitors. [17]
The model reportedly operated under multiple “personas” (e.g., “Situational Awareness,” “Monk Mode,” “Max Leverage”), each emphasizing different risk profiles and trading styles. [18]
2.3 What’s new in Grok 4.20?
xAI hasn’t released a formal technical card for Grok 4.20, but from public write‑ups you can infer a few things:
- It leans heavily on multi‑source data fusion, combining market trends, technical indicators and news sentiment into its decisions. [19]
- The model appears tuned for risk‑aware adaptation, dialing up or down aggressiveness depending on volatility, and adjusting exit points as markets shift. [20]
- The trading setup uses Grok not just as a text generator but as an agentic decision system: it plans, revises and executes trades over time rather than just recommending one‑off actions.
In short, Grok 4.20 is less a new chatbot and more a domain‑specialized agent built on top of the Grok 4.x family.
2.4 Caveats and controversies
It’s important to keep perspective:
- Alpha Arena is one benchmark, with rules and time windows that may not generalize to all markets. Some coverage emphasizes that other models took heavy losses partly because they were configured with aggressive, experimental strategies — not because they’re universally worse models. [21]
- Regulators and finance experts are already raising questions about AI‑driven trading, including the risk of market manipulation, unfair advantages for firms with access to elite models, and the need for updated oversight frameworks. [22]
On top of that, xAI’s broader ecosystem had a rougher moment this week: Grokipedia 0.2, the company’s AI‑curated alternative to Wikipedia, came under fire for chaotic editing rules and vulnerability to misinformation. Critics noted that Grok itself decides which user edits to accept, with little transparency or human review. [23]
Still, as a model‑of‑the‑week story, Grok 4.20’s trading performance is a clear signal: xAI wants to be known not just for edgy chat responses, but for hard‑nosed, high‑stakes agentic performance in finance and beyond.
3. Google DeepMind: Gemini 3 Deep Think and Nano Banana Pro Under the Microscope
Google DeepMind spent this week consolidating its Gemini 3 launch and grappling with a very public bias controversy in its new image model.
3.1 Gemini 3: “learn anything, build anything, plan anything”
Gemini 3 actually debuted in November, but Google pushed several developer and product updates in the Dec 1–7 window:
- Google’s official blog describes Gemini 3 Pro as its most intelligent AI model, outperforming earlier generations on reasoning, multimodality and coding benchmarks, and designed for three core use cases: learning, planning and building. [24]
- On 5 December, Google published “15 examples of what Gemini 3 can do”, showcasing use cases like building simple apps from natural‑language specs, planning trips, analyzing videos (e.g., sports coaching from a phone clip), and generating structured study plans. [25]
Independent tests this week broadly support Google’s claims on multimodal strength:
- TechRadar ran Gemini 3 Pro against ChatGPT 5.1 and Claude Opus 4.5 on complex image‑understanding tasks (Times Square at night, a Renaissance fresco, a cluttered room). Gemini 3 Pro came out as the most grounded and precise in visual reasoning, with fewer hallucinations and better spatial understanding. [26]
3.2 Gemini 3 Deep Think: advanced reasoning for power users
The most headline‑worthy model update this week is Gemini 3 Deep Think:
- On 7 December, AndroidCentral reported that Gemini 3 Deep Think — previously limited to internal safety testers — is now available to Google AI Ultra subscribers inside the Gemini app. [27]
- Google labels Deep Think as its “most advanced reasoning feature”, using iterative rounds of thought to explore multiple solution paths before answering, similar in spirit to OpenAI’s “o‑series” deliberate‑reasoning models. [28]
For now, that means:
- Only high‑paying users (AI Ultra is around $250/month) can access Deep Think
- Developers and everyday users still interact mainly with Gemini 3 Pro (preview) and older 2.5 variants
Still, by shipping Deep Think broadly — even in a premium tier — Google signals that heavyweight, chain‑of‑thought‑style reasoning is becoming a mainstream product feature, not just a research demo.
3.3 Nano Banana Pro: powerful image model, serious bias issues
Google DeepMind’s other major model story this week is Nano Banana Pro, a Gemini‑based image generator and editor introduced in November but heavily covered again in early December:
- Google’s own announcement frames Nano Banana Pro as a state‑of‑the‑art image generation and editing model, built on Gemini 3 Pro, with accurate multilingual text rendering, 4K output and strong support for diagrams, infographics and branded content. [29]
- InfoQ and other developer outlets highlighted its grounded, multimodal synthesis: it’s designed to take structured inputs (notes, data, sketches) and turn them into detailed visuals with better factual alignment than typical image models. [30]
But on 4 December, The Guardian published an investigation showing that Nano Banana Pro is prone to racial and contextual bias:
- Prompts like “volunteer helps children in Africa” reportedly produced near‑uniform images of white women surrounded by Black children, often in stereotypically impoverished settings. [31]
- Some images even included logos of major NGOs such as Save the Children and the Red Cross, despite users not asking for branded content — raising legal and ethical questions around intellectual‑property misuse. [32]
Google acknowledged that certain prompts can still slip past safeguards and said it is working to refine its filters and training data. [33]
The episode underscores a central tension of the week: even as DeepMind races to show off powerful new models, the industry is still struggling to keep those models from amplifying harmful stereotypes.
3.4 Gemini UX updates and AGI ambitions
Two more DeepMind‑related storylines round out the week:
- Gemini web redesign and “My Stuff” (Dec 3)
- Gemini’s web interface received a visual refresh plus a new “My Stuff” section, making it easier to manage AI‑generated content like images, videos and Canvas outputs. AndroidCentral notes that the redesign coincides with the rollout of Gemini 3 Pro and Nano Banana Pro across Search and the Gemini app. [34]
- Demis Hassabis on AGI and world models (Dec 5)
- At Axios’ AI+ SF summit, DeepMind CEO Demis Hassabis reiterated that artificial general intelligence could arrive as early as 2030, pointing to “world models” — systems that simulate and reason about complex environments — as a key research frontier for the coming year. [35]
In other words, while this week is about products like Gemini 3 Deep Think and Nano Banana Pro, DeepMind’s eyes are very much on a longer AGI horizon.
4. How the Week’s AI Model News Fits Together
Put side by side, the moves from OpenAI, xAI and Google DeepMind between 1–7 December 2025 tell a pretty coherent story about where AI is going right now.
4.1 The reasoning race is officially front and center
- OpenAI is rushing GPT‑5.2 out the door and quietly cultivating Garlic, betting that better reasoning and reliability will keep ChatGPT ahead. [36]
- Google DeepMind is productizing Gemini 3 Deep Think, making heavyweight multi‑step reasoning a paid feature for power users. [37]
- xAI is showcasing Grok 4.20 not with benchmarks, but with live‑money trading performance, positioning reasoning and planning as capabilities you can literally bank on. [38]
The common thread: the era of “just chat” is over. The companies want their models to plan, optimize, and act — not only talk.
4.2 Domain‑specific models and agents are emerging
Across the three labs, you can see a clear pivot from single “god models” to contextual, domain‑tuned systems:
- OpenAI’s rumored Garlic is aimed squarely at domains like biomedicine and healthcare. TechStock²+1
- xAI’s Grok 4.20 is a finance‑tuned trading agent, optimized around markets rather than general chat. [39]
- DeepMind’s Nano Banana Pro is specifically an image model for grounded visual synthesis, deeply integrated into Google’s ads, productivity and creative tools. [40]
Expect more of this: foundation models at the core, specialist variants and agents at the edges.
4.3 The open‑source and “everyone else” pressure cooker
Even though this piece focuses on OpenAI, xAI and DeepMind, they’re not alone this week:
- Chinese startup DeepSeek released DeepSeek‑V3.2 and V3.2‑Speciale, open‑sourcing models that TechRadar says rival or surpass GPT‑5 and Gemini 3 Pro on reasoning, coding and math — under a permissive MIT license. [41]
That kind of release intensifies pressure on proprietary labs to justify their pricing and closed models with clear, meaningful advantages — like Gemini 3 Deep Think, GPT‑5.2 or Grok 4.20’s trading record.
4.4 Safety, bias and governance are not keeping up
This week also made it painfully obvious that capability gains are outpacing safeguards:
- OpenAI is being pressed by courts to hand over 20 million anonymized ChatGPT logs in its copyright fight with major publishers, raising serious privacy and governance questions. TechStock²
- A new AI Safety Index — summarized in Reuters coverage referenced in this week’s OpenAI roundups — says major labs, including OpenAI and xAI, remain “far short of emerging global standards” on safety practices. TechStock²
- Google is facing real‑world consequences for Nano Banana Pro’s biased “white saviour” imagery, reinforcing how easily training data can reproduce harmful tropes at scale. [42]
Meanwhile, all three are pushing models deeper into finance, healthcare, public information and education — areas where failures are especially costly.
5. What to Watch Next Week
Based on what happened between Dec 1–7, here’s what’s worth watching in the coming days:
- GPT‑5.2 launch details
Does OpenAI hit the rumored December 9 date, and how big is the gap vs Gemini 3 in independent tests? [43] - Public documentation of Garlic and Grok 4.20
Will OpenAI and xAI publish technical reports, safety evaluations or red‑team results for their specialist models, or will they remain mostly marketing stories? - Gemini 3 Deep Think expansion
If Deep Think proves stable and valuable for Ultra customers, expect Google to broaden access — and expect rivals to respond with their own “deep reasoning” modes. - Regulatory and market reactions
Between AI‑driven trading bots, biased image generators and huge multi‑country data‑center deals, regulators in the US, EU and Asia are unlikely to stay quiet.
For now, though, the scoreboard for “AI Models of the Week” looks something like this:
- OpenAI – in “code red”, racing GPT‑5.2 and Garlic into position while shoring up training and safety tooling.
- xAI – using Grok 4.20 to claim real‑world dominance in at least one high‑stakes domain: stock trading.
- Google DeepMind – doubling down on Gemini 3 and Deep Think while learning, the hard way, that powerful image models like Nano Banana Pro need far stronger guardrails.
Whichever lab you’re rooting for, December 2025 is making one thing clear: the AI model race is now about depth of reasoning, domain expertise and trust — not just who can generate the longest context or the flashiest demo.
References
1. apnews.com, 2. cloudwars.com, 3. community.openai.com, 4. www.theverge.com, 5. timesofindia.indiatimes.com, 6. ppc.land, 7. www.theinformation.com, 8. www.eweek.com, 9. www.reuters.com, 10. www.reuters.com, 11. openai.com, 12. www.analyticsvidhya.com, 13. apnews.com, 14. acquinox.capital, 15. www.nextbigfuture.com, 16. www.nextbigfuture.com, 17. www.geeky-gadgets.com, 18. www.nextbigfuture.com, 19. www.geeky-gadgets.com, 20. www.geeky-gadgets.com, 21. www.geeky-gadgets.com, 22. www.geeky-gadgets.com, 23. www.theverge.com, 24. blog.google, 25. blog.google, 26. www.techradar.com, 27. www.androidcentral.com, 28. www.androidcentral.com, 29. blog.google, 30. www.infoq.com, 31. www.theguardian.com, 32. www.theguardian.com, 33. www.theguardian.com, 34. www.androidcentral.com, 35. www.axios.com, 36. www.theverge.com, 37. www.androidcentral.com, 38. www.geeky-gadgets.com, 39. www.geeky-gadgets.com, 40. blog.google, 41. www.techradar.com, 42. www.theguardian.com, 43. www.theverge.com

