AI chatbots fail at accurate news, major study reveals

AI chatbots such as ChatGPT and Copilot routinely distort the news and struggle to distinguish facts from opinion. That's according to a major new study from 22 international public broadcasters, including DW.
A major new study by 22 public service media organisations, including DW, has found that four of the most commonly used AI assistants misrepresent news content 45 per cent of the time — regardless of language or territory.
Journalists from a range of public service broadcasters, including the BBC (UK) and NPR (US), evaluated the responses of four AI assistants, or chatbots — ChatGPT, Microsoft's Copilot, Google's Gemini and Perplexity AI.
More To Read
- Government to use new AI system in placement of Grade 9 students to senior schools
- ChatGPT on WhatsApp is coming to an end: Here’s what you need to know
- OpenAI launches ChatGPT Atlas, an AI-powered web browser
- Kenya records 842 million cyber threats as AI-powered attacks escalate
- African languages for AI: The project that’s gathering a huge new dataset
- ChatGPT can now automatically manage saved memories
Measuring criteria such as accuracy, sourcing, providing context, the ability to editorialise appropriately and the ability to distinguish fact from opinion, the study found that almost half of all answers had at least one significant issue, while 31 per cent contained serious sourcing problems and 20 per cent contained major factual errors.
DW found that 53 per cent of the answers provided by the AI assistants to its questions had significant issues, with 29 per cent experiencing specific issues with accuracy.
Among the factual errors made in response to DW questions was Olaf Scholz being named as German Chancellor, even though Friedrich Merz had been made Chancellor one month earlier. Another saw Jens Stoltenberg named as NATO secretary general after Mark Rutte had already taken over the role.
AI assistants have become an increasingly common way for people around the world to access information. According to the Reuters Institute's Digital News Report 2025, 7 per cent of online news consumers use AI chatbots to get news, with the figure rising to 15 per cent for those aged under 25.
Those behind the study say it confirms that AI assistants systematically distort news content of all kinds.
"This research conclusively shows that these failings are not isolated incidents," said Jean Philip De Tender, deputy director general of the European Broadcasting Union (EBU), which co-ordinated the study.
"They are systemic, cross-border, and multilingual, and we believe this endangers public trust. When people don't know what to trust, they end up trusting nothing at all, and that can deter democratic participation."
Unprecedented study
This is one of the largest research projects of its kind to date and follows a study undertaken by the BBC in February 2025. That study found that more than half of all AI answers it checked had significant issues, while almost one-fifth of the answers citing BBC content as a source introduced factual errors of their own.
The new study saw media organisations from 18 countries and across multiple language groups apply the same methodology as the BBC study to 3,000 AI responses.
The organisations asked common news questions to the four AI assistants, such as "What is the Ukraine minerals deal?" or "Can Trump run for a third term?"
Journalists then reviewed the answers against their own expertise and professional sourcing, without knowing which assistant provided them.
When compared with the BBC study from eight months ago, the results show some minor improvement, but with a high level of error still apparent.
"We're excited about AI and how it can help us bring even more value to audiences," Peter Archer, BBC programme director of generative AI, said in a statement. "But people must be able to trust what they read, watch and see. Despite some improvements, it's clear that there are still significant issues with these assistants."
Gemini performed the worst of the four chatbots, with 72 per cent of its responses having significant sourcing issues. In the BBC study, Microsoft's Copilot and Gemini were deemed the worst performers. But across both studies, all four AI assistants had issues.
In a statement provided to the BBC back in February, a spokesperson for OpenAI, which developed ChatGPT, said: "We support publishers and creators by helping 300 million weekly ChatGPT users discover quality content through summaries, quotes, clear links, and attribution."
Researchers call for action from governments and AI companies
The broadcasters and media organisations behind the study are calling for national governments to take action.
In a press release, the EBU said its members are "pressing EU and national regulators to enforce existing laws on information integrity, digital services, and media pluralism."
They also stressed that independent monitoring of AI assistants must be a priority going forward, given how fast new AI models are being rolled out.
Meanwhile, the EBU has joined up with several other international broadcasting and media groups to establish a joint campaign called "Facts In: Facts Out", which calls on AI companies themselves to take more responsibility for how their products handle and redistribute news.
In a statement, the organisers of the campaign said, "When these systems distort, misattribute or "decontextualise trusted news, they undermine public trust."
"This campaign's demand is simple: If facts go in, facts must come out. AI tools must not compromise the integrity of the news they use."
Top Stories Today