Today: 12 May 2026
Google’s Bard ‘Learnt Bengali’ Claim Has a Data Problem, Former Researcher Says
6 July 2024
2 mins read

Google’s Bard ‘Learnt Bengali’ Claim Has a Data Problem, Former Researcher Says

UPDATED: SAN FRANCISCO, May 6, 2026, 05:05 (PDT)

Google’s claim that a Bard-era AI system learned Bengali on its own is facing scrutiny because a former company researcher pointed to Google training data showing Bengali was already in the mix. The dispute cuts at a basic question in the AI race: whether companies are describing real technical surprises, or dressing up known training effects as mystery.

It matters now because Bard no longer sits on the edge of Google’s product line. It has been folded into Gemini, which Google is pushing into cars, business software and developer tools, giving old claims about what its models “learn” fresh commercial weight. Reuters

The issue also lands in a harder market. Google is fighting OpenAI, Microsoft and other rivals for users and enterprise customers while trying to prove that its heavy AI spending can turn into durable revenue. Reuters reported in April that Google was putting AI agents at the heart of an enterprise push under the Gemini Enterprise name.

The Bengali claim traces back to a 2023 “60 Minutes” segment on Google’s AI work. James Manyika, a senior Google executive, said the company found that after limited prompting in Bengali, the system could “translate all of Bengali,” while Chief Executive Sundar Pichai described parts of AI behavior as a “black box” — industry shorthand for systems whose internal decision-making is hard to trace. CBS News

Margaret Mitchell, a former Google AI ethics researcher, challenged that framing. Google’s PaLM research paper listed Bengali in its non-code training data: 0.194 billion Bengali tokens, or 0.026%, and 0.042 billion tokens of Latin-script Bengali, or 0.006%. A token is a chunk of text used to train a model. That table does not prove the exact Bard system shown on television used the same data, but it weakens any simple claim that the model had no prior exposure to Bengali.

Emily M. Bender, a University of Washington linguistics professor, wrote after the broadcast that creating ignorance about training data makes model performance look more surprising than it is. She said the rhetorical move of treating Bard like a person was misleading: “IT. IS. NOT.” Medium

Google’s own early Bard blog used more restrained language. The company described a large language model, or LLM, as a “prediction engine” that chooses likely next words, and warned that such systems can produce inaccurate or false information while sounding confident. blog.google

Bard’s model lineage also changed quickly. Google first described Bard as powered by a lightweight version of LaMDA; in May 2023, it said PaLM 2’s multilingual capabilities were helping expand Bard to new languages. In February 2024, Google renamed Bard as Gemini and launched a mobile app and paid Gemini Advanced plan.

That rebrand was part of a broader contest with Microsoft and OpenAI. Jack Krawczyk, then a Google product lead, told Reuters that a $20-a-month AI product needed more than raw model access: “access to a model alone is not really enough.” Reuters

But the criticism has a limit. A training table cannot reconstruct every prompt, fine-tuning step or product layer behind Bard, and large models can still show unexpected behavior after sparse examples. The risk for Google is different: loose claims about self-learning may invite tougher questions from customers, regulators and researchers about training data, evaluation methods and what companies know before they ship.

For Bengali speakers and other lower-resource language users, the practical issue is less mystical. Independent research on ChatGPT translation across Bengali and five other languages found gender defaults and stereotype-linked errors, underscoring that language coverage in AI systems still needs direct testing, not just broad claims of fluency.

Google has kept moving. In February, it said Gemini 3.1 Pro was rolling out across consumer, developer and enterprise products, including the Gemini API, Vertex AI, the Gemini app and NotebookLM. That makes the old Bard Bengali row more than a footnote: as Gemini spreads, Google has less room for fuzzy language about what its models know, where that knowledge came from and how much of it was learned “on its own.” blog.google

Stock Market Today

  • Investors Pour $15 Billion into Risky Bond ETFs in April Seeking Higher Yields
    May 12, 2026, 3:39 PM EDT. In April, investors allocated around $15 billion into credit-sensitive bond ETFs, according to State Street Investment Management data. The inflows were mainly into investment-grade corporate bonds ($7 billion), high-yield bonds ($3.8 billion), and bank loans and collateralized loan obligations (CLOs, $2.5 billion). This surge in demand was driven by easing geopolitical concerns over Iran and strong corporate earnings beyond just Big Tech, boosting risk appetite in fixed income markets. High-yield bond ETFs now offer attractive 30-day SEC yields close to 7%, rewarding investors taking on credit risk. Experts caution balancing these higher-risk assets in portfolios to maintain diversification, emphasizing that these investments complement rather than dominate bond holdings.

Latest article

ZoomInfo Stock Sinks as AI Reset Turns a Q1 Beat Into a 2027 Growth Problem

ZoomInfo Stock Sinks as AI Reset Turns a Q1 Beat Into a 2027 Growth Problem

12 May 2026
ZoomInfo Technologies shares fell about 35% Tuesday after the company slashed its 2026 revenue outlook to $1.185 billion–$1.205 billion, citing AI-driven buying pauses and pricing pressure. Q1 revenue rose 1.5% to $310.2 million, but management announced plans to cut 600 jobs, or 20% of staff, to reduce costs.
Berkshire Hathaway Stock Rises Today as Inflation Makes Its Cash Hoard Matter Again

Berkshire Hathaway Stock Rises Today as Inflation Makes Its Cash Hoard Matter Again

12 May 2026
Berkshire Hathaway’s Class B shares climbed 1.4% to $486.46 Tuesday after April CPI data showed inflation up 0.6% for the month and 3.8% year-over-year. The company reported $11.35 billion in first-quarter operating earnings, up from $9.64 billion a year earlier. Berkshire held $373.5 billion in cash and short-term Treasuries at March 31. Prediction markets showed a 97.5% chance of no Fed rate change in June.
Why Broadwind Stock Is Surging: Investors Reprice a Wind Exit Into a Power-Demand Story

Why Broadwind Stock Is Surging: Investors Reprice a Wind Exit Into a Power-Demand Story

12 May 2026
Broadwind shares surged about 70% to $3.46 Tuesday after its Q1 loss of $0.02 per share beat estimates and revenue topped forecasts at $34.06 million. Orders rose 23% despite total revenue falling 7.5% year over year. The company recently sold its Texas facility and withdrew 2026 guidance following its wind-tower exit. Trading volume exceeded 10 million shares.
Internet Access in Syria
Next Story

Internet Access in Syria

Go toTop