Today: 2 June 2026
Google’s Bard ‘Learnt Bengali’ Claim Has a Data Problem, Former Researcher Says
6 July 2024
2 mins read

Google’s Bard ‘Learnt Bengali’ Claim Has a Data Problem, Former Researcher Says

UPDATED: SAN FRANCISCO, May 6, 2026, 05:05 (PDT)

Google’s claim that a Bard-era AI system learned Bengali on its own is facing scrutiny because a former company researcher pointed to Google training data showing Bengali was already in the mix. The dispute cuts at a basic question in the AI race: whether companies are describing real technical surprises, or dressing up known training effects as mystery.

It matters now because Bard no longer sits on the edge of Google’s product line. It has been folded into Gemini, which Google is pushing into cars, business software and developer tools, giving old claims about what its models “learn” fresh commercial weight. Reuters

The issue also lands in a harder market. Google is fighting OpenAI, Microsoft and other rivals for users and enterprise customers while trying to prove that its heavy AI spending can turn into durable revenue. Reuters reported in April that Google was putting AI agents at the heart of an enterprise push under the Gemini Enterprise name.

The Bengali claim traces back to a 2023 “60 Minutes” segment on Google’s AI work. James Manyika, a senior Google executive, said the company found that after limited prompting in Bengali, the system could “translate all of Bengali,” while Chief Executive Sundar Pichai described parts of AI behavior as a “black box” — industry shorthand for systems whose internal decision-making is hard to trace. CBS News

Margaret Mitchell, a former Google AI ethics researcher, challenged that framing. Google’s PaLM research paper listed Bengali in its non-code training data: 0.194 billion Bengali tokens, or 0.026%, and 0.042 billion tokens of Latin-script Bengali, or 0.006%. A token is a chunk of text used to train a model. That table does not prove the exact Bard system shown on television used the same data, but it weakens any simple claim that the model had no prior exposure to Bengali.

Emily M. Bender, a University of Washington linguistics professor, wrote after the broadcast that creating ignorance about training data makes model performance look more surprising than it is. She said the rhetorical move of treating Bard like a person was misleading: “IT. IS. NOT.” Medium

Google’s own early Bard blog used more restrained language. The company described a large language model, or LLM, as a “prediction engine” that chooses likely next words, and warned that such systems can produce inaccurate or false information while sounding confident. blog.google

Bard’s model lineage also changed quickly. Google first described Bard as powered by a lightweight version of LaMDA; in May 2023, it said PaLM 2’s multilingual capabilities were helping expand Bard to new languages. In February 2024, Google renamed Bard as Gemini and launched a mobile app and paid Gemini Advanced plan.

That rebrand was part of a broader contest with Microsoft and OpenAI. Jack Krawczyk, then a Google product lead, told Reuters that a $20-a-month AI product needed more than raw model access: “access to a model alone is not really enough.” Reuters

But the criticism has a limit. A training table cannot reconstruct every prompt, fine-tuning step or product layer behind Bard, and large models can still show unexpected behavior after sparse examples. The risk for Google is different: loose claims about self-learning may invite tougher questions from customers, regulators and researchers about training data, evaluation methods and what companies know before they ship.

For Bengali speakers and other lower-resource language users, the practical issue is less mystical. Independent research on ChatGPT translation across Bengali and five other languages found gender defaults and stereotype-linked errors, underscoring that language coverage in AI systems still needs direct testing, not just broad claims of fluency.

Google has kept moving. In February, it said Gemini 3.1 Pro was rolling out across consumer, developer and enterprise products, including the Gemini API, Vertex AI, the Gemini app and NotebookLM. That makes the old Bard Bengali row more than a footnote: as Gemini spreads, Google has less room for fuzzy language about what its models know, where that knowledge came from and how much of it was learned “on its own.” blog.google

Latest articles

Swarmer Soars 38% After Cramer Shouts Out Drone Software

Swarmer Soars 38% After Cramer Shouts Out Drone Software

2 June 2026
Swarmer soared 37.7% to $78.52 after Jim Cramer called it “a natural” on “Mad Money,” sparking retail and drone investor interest despite shrinking revenue, widening losses, and execution risks around defense contracts; shares now trade over 15 times their March IPO price.
Abivax Shares Drop After Trial Win and Cancer Cases Raise Concerns

Abivax Shares Drop After Trial Win and Cancer Cases Raise Concerns

2 June 2026
Abivax shares plunged 44% to €63.10 after strong Phase 3 results for obefazimod in ulcerative colitis were overshadowed by cancer and dysplasia cases in the high-dose arm, which investigators deemed unrelated to treatment, raising investor concerns over safety and regulatory risk despite meeting all efficacy endpoints.
Netflix Stock Sinks, Wall Street Turns to June 4

Netflix Stock Sinks, Wall Street Turns to June 4

2 June 2026
Netflix shares slid nearly 3% to $83.36 as investors questioned growth targets ahead of Thursday’s annual meeting, with the stock under pressure despite management maintaining 2026 revenue and margin forecasts and warning of heavier content costs in the first half.
Marathon Digital Drops as Bitcoin Tops $70,000; Eyes on AI Push

Marathon Digital Drops as Bitcoin Tops $70,000; Eyes on AI Push

2 June 2026
MARA fell 4.2% to $14.23 as bitcoin slid below $70,000, highlighting the miner’s ongoing dependence on crypto prices despite efforts to pivot toward AI and power infrastructure; analysts remain split on the new strategy, and the Long Ridge deal still faces regulatory and execution risks.
Dow, S&P inch up as AI surge faces $80 billion hurdle

Dow, S&P inch up as AI surge faces $80 billion hurdle

2 June 2026
Alphabet shares fell after announcing plans to raise $80 billion through equity offerings to fund soaring AI infrastructure costs, even as demand for its AI services exceeded supply and capital spending forecasts climbed to $180–$190 billion for 2026.
Dow Jones Moves Up in Late Trading

Dow Jones Moves Up in Late Trading

2 June 2026
Dow Jones rose 0.29% to 51,227.89 as AI-driven buying offset inflation and rate risks; Alphabet shares fell after boosting 2026 capital spending to $180–$190 billion and seeking $80 billion in equity, raising questions about funding the AI boom, while investors await Friday’s payrolls report for the next market signal.
Internet Access in Syria
Next Story

Internet Access in Syria

Go toTop