Google’s Bard ‘Learnt Bengali’ Claim Has a Data Problem, Former Researcher Says

UPDATED: SAN FRANCISCO, May 6, 2026, 05:05 (PDT)

Google’s claim that a Bard-era AI system learned Bengali on its own is facing scrutiny because a former company researcher pointed to Google training data showing Bengali was already in the mix. The dispute cuts at a basic question in the AI race: whether companies are describing real technical surprises, or dressing up known training effects as mystery.

It matters now because Bard no longer sits on the edge of Google’s product line. It has been folded into Gemini, which Google is pushing into cars, business software and developer tools, giving old claims about what its models “learn” fresh commercial weight. Reuters

The issue also lands in a harder market. Google is fighting OpenAI, Microsoft and other rivals for users and enterprise customers while trying to prove that its heavy AI spending can turn into durable revenue. Reuters reported in April that Google was putting AI agents at the heart of an enterprise push under the Gemini Enterprise name.

The Bengali claim traces back to a 2023 “60 Minutes” segment on Google’s AI work. James Manyika, a senior Google executive, said the company found that after limited prompting in Bengali, the system could “translate all of Bengali,” while Chief Executive Sundar Pichai described parts of AI behavior as a “black box” — industry shorthand for systems whose internal decision-making is hard to trace. CBS News

Margaret Mitchell, a former Google AI ethics researcher, challenged that framing. Google’s PaLM research paper listed Bengali in its non-code training data: 0.194 billion Bengali tokens, or 0.026%, and 0.042 billion tokens of Latin-script Bengali, or 0.006%. A token is a chunk of text used to train a model. That table does not prove the exact Bard system shown on television used the same data, but it weakens any simple claim that the model had no prior exposure to Bengali.

Emily M. Bender, a University of Washington linguistics professor, wrote after the broadcast that creating ignorance about training data makes model performance look more surprising than it is. She said the rhetorical move of treating Bard like a person was misleading: “IT. IS. NOT.” Medium

Google’s own early Bard blog used more restrained language. The company described a large language model, or LLM, as a “prediction engine” that chooses likely next words, and warned that such systems can produce inaccurate or false information while sounding confident. blog.google

Bard’s model lineage also changed quickly. Google first described Bard as powered by a lightweight version of LaMDA; in May 2023, it said PaLM 2’s multilingual capabilities were helping expand Bard to new languages. In February 2024, Google renamed Bard as Gemini and launched a mobile app and paid Gemini Advanced plan.

That rebrand was part of a broader contest with Microsoft and OpenAI. Jack Krawczyk, then a Google product lead, told Reuters that a $20-a-month AI product needed more than raw model access: “access to a model alone is not really enough.” Reuters

But the criticism has a limit. A training table cannot reconstruct every prompt, fine-tuning step or product layer behind Bard, and large models can still show unexpected behavior after sparse examples. The risk for Google is different: loose claims about self-learning may invite tougher questions from customers, regulators and researchers about training data, evaluation methods and what companies know before they ship.

For Bengali speakers and other lower-resource language users, the practical issue is less mystical. Independent research on ChatGPT translation across Bengali and five other languages found gender defaults and stereotype-linked errors, underscoring that language coverage in AI systems still needs direct testing, not just broad claims of fluency.

Google has kept moving. In February, it said Gemini 3.1 Pro was rolling out across consumer, developer and enterprise products, including the Gemini API, Vertex AI, the Gemini app and NotebookLM. That makes the old Bard Bengali row more than a footnote: as Gemini spreads, Google has less room for fuzzy language about what its models know, where that knowledge came from and how much of it was learned “on its own.” blog.google

Marcin Frąckiewicz Latest posts

CEO of TS2 Space and founder of TS2.tech. Expert in satellites, telecommunications, and emerging technologies, covering trends in space, AI, and connectivity.

View all

QuantumScape Shares Rise 11.7% on Eagle Line Launch and Initial Billings
May 12, 2026, 5:46 PM EDT. QuantumScape (QS) gained 11.7% following the launch of its Eagle Line pilot-scale solid-state battery cell facility and initial $11 million in customer billings from ecosystem partners. The company reported a narrower GAAP net loss in Q1 2026, driven by reduced operating expenses. This marks a shift toward early commercialization and a potential licensing and royalty revenue stream, altering QuantumScape's risk profile and business model. Investors remain cautious as success depends on partner uptake and scaling production. Analysts' revenue forecasts vary widely, reflecting uncertainty around the new manufacturing blueprint. QuantumScape projects $544.5 million revenue and $33.3 million earnings by 2029, but some estimates are much lower, highlighting diverging market views.

British American Tobacco Stock Jumps as FDA Shift Gives Vuse and Velo a Cleaner Read

12 May 2026

British American Tobacco shares jumped 5.82% in London to £46.34 after the FDA signaled a softer enforcement stance on some e-cigarette and nicotine pouch products. A U.S. judge also dismissed BAT’s North Korea sanctions case following a $630 million settlement. The FTSE 100 slipped 0.04%. BAT’s U.S.-listed ADR closed up 5.3% at $63.64.

Camtek Stock Falls 16% After Earnings Beat as Margin Pressure and Hot CPI Hit AI Chip Trade

12 May 2026

Camtek shares fell 15.8% to $174.63 despite beating Q1 revenue and adjusted EPS estimates by small margins. Q1 revenue rose 2.5% year over year, but non-GAAP EPS dropped to $0.70 from $0.79 and operating margin narrowed to 25.5%. Management guided Q2 revenue to $129–$131 million and expects second-half revenue to rise over 25%. Broader market pressure followed a hot April CPI and rising Treasury yields.

Oscar Health stock rises in premarket after 2026 revenue outlook; House subpoenas add a fresh risk

Oscar Health Stock Extends Post-Earnings Repricing as Margins Outweigh ACA Risk

12 May 2026

Oscar Health shares rose 7.9% to $23.73 late Tuesday, with volume near 13 million, after posting strong Q1 profit and lower claims costs despite missing revenue estimates. The company’s medical loss ratio fell to 70.5% from 75.4% a year earlier. Membership climbed to 3.17 million. ACA enrollment churn and policy risk remain concerns.

Google’s Bard ‘Learnt Bengali’ Claim Has a Data Problem, Former Researcher Says

Stock Market Today

Latest article

British American Tobacco Stock Jumps as FDA Shift Gives Vuse and Velo a Cleaner Read

Camtek Stock Falls 16% After Earnings Beat as Margin Pressure and Hot CPI Hit AI Chip Trade

Oscar Health Stock Extends Post-Earnings Repricing as Margins Outweigh ACA Risk

Internet Access in Syria

Google’s Bard ‘Learnt Bengali’ Claim Has a Data Problem, Former Researcher Says

Datavault AI Inc. (NASDAQ: DVLT) Stock: Latest News, Dream Bowl Meme Coin Distribution Details, and Updated Analyst Forecasts (Dec. 12, 2025)

AST SpaceMobile, Inc. (ASTS) – Key Facts

Stock Market Today

Latest article

British American Tobacco Stock Jumps as FDA Shift Gives Vuse and Velo a Cleaner Read

Camtek Stock Falls 16% After Earnings Beat as Margin Pressure and Hot CPI Hit AI Chip Trade

Oscar Health Stock Extends Post-Earnings Repricing as Margins Outweigh ACA Risk

Internet Access in Syria