Updated: Rome, May 6, 2026, 14:00 CEST
- Italy’s competition authority has made disclosure of AI “hallucination” risk a consumer-protection issue, not just a software bug, after closing probes into DeepSeek, Mistral and NOVA AI. AGCM
- The timing is sharp: generative AI has reached 53% population adoption within three years, while organisational adoption reached 88%, Stanford HAI said in its 2026 AI Index.
- EU rules are tightening, with obligations for general-purpose AI model providers already in force and Commission enforcement powers due from Aug. 2, 2026.
Italy’s competition authority last week moved the core defect in generative AI into the consumer-law arena, closing investigations after DeepSeek, Mistral and NOVA AI agreed to warn users more clearly that their chatbots can produce inaccurate or misleading material. In the language of the industry, those false but fluent outputs are “hallucinations.” AGCM
The question in the TS2 article title, “Why Is AI Not Perfect?”, now has a market answer as well as a technical one: companies are selling systems that can draft, search, summarize and advise, but regulators and buyers still cannot treat them as ordinary deterministic software. Deterministic software gives the same output when the input and rules are unchanged; large language models, or LLMs, generate likely text from patterns in data.
Why it matters now is scale. Stanford’s 2026 AI Index said AI capability is accelerating and reaching more people than ever, with industry producing more than 90% of notable frontier models in 2025 and organisations rapidly adding AI to workflows. The same report said responsible-AI reporting is not keeping pace with deployment.
OpenAI researchers Adam Kalai, Santosh Vempala, Ofir Nachum and Eddie Zhang wrote in September that standard training and evaluation tend to “reward guessing over acknowledging uncertainty.” OpenAI said hallucinations remain “stubbornly hard” and that ChatGPT still hallucinates, even though newer models have lower rates. OpenAI
The failures are not limited to lab tests. A BBC-led, European Broadcasting Union-coordinated study of more than 3,000 responses from AI assistants found 45% had at least one significant issue, 31% had serious sourcing problems and 20% had major accuracy problems. Jean Philip De Tender, EBU media director and deputy director general, said the failures were “not isolated incidents” and warned that users who do not know what to trust may “end up trusting nothing at all.” ebu.ch
Stanford HAI reported hallucination rates across 26 top models ranging from 22% to 94% in a new accuracy benchmark. It said models often handle a false statement when presented as someone else’s belief, but performance drops when the same false statement is framed as the user’s belief.
The commercial push is still intense. Thomson Reuters CEO Steve Hasker said professionals in law, tax, audit and compliance are choosing AI products that can be verified and audited, and told Reuters that “the consequences of error and hallucination are too much to bear.” The company said first-quarter revenue rose 10% to $2.09 billion, beating estimates. Reuters
That is the business problem. A chatbot that is “about right” may be acceptable for a marketing draft. In a court filing, a tax return or medical triage, the same error can create sanctions, fines or patient harm.
Regulators are moving faster in Europe than in some other markets. The EU AI Act became law in 2024, and the European Commission says rules for general-purpose AI models — broad models that can perform many types of task — became effective in August 2025. Its enforcement powers for those providers are due to start on Aug. 2, 2026, including fines.
The tougher case is what changes the model’s behaviour, not just what warning appears under the chat box. The 2026 International AI Safety Report, led by Turing Award winner Yoshua Bengio and more than 100 experts, said current systems can fabricate information, produce flawed code and give misleading advice, and that reliability techniques reduce failure rates but not enough for many high-stakes settings.
There are workarounds. Retrieval-augmented generation, or RAG, pulls information from outside databases before a model answers, and human review can catch some errors. But those controls add cost and delay; they also weaken the pitch that AI will replace whole workflows cleanly.
The risk cuts both ways. More disclosure could protect consumers but leave vendors arguing that users were warned. Stricter accuracy rules could improve trust but raise costs for smaller firms, while looser rules could let cheap tools win share in tasks where mistakes are hard to see until late.
For now, the practical answer is blunt. AI is not perfect because it is built to predict and generate, not to know in the human or legal sense of the word. That does not make it useless. It makes it supervised software, even when it sounds certain.