A high-profile academic paper once presented ChatGPT as a lifesaver for student learning retired, nearly a year after it helped shape early narratives about AI in education. Springer Nature removed it last month due to “discrepancies” in the meta-analysis that undermined confidence in the results. The publisher also said that “the authors had not responded to correspondence regarding the retraction.”
By the time of the retraction, the paper had made its way everywhere. Published in May 2025 in Humanities & Social Sciences Communications, it attempted to measure the impact of ChatGPT by combining results from 51 separate studies. The authors compared the results between students who used the chatbot and those who didn’t. Ultimately, they reported findings such as “a large positive impact on improving academic performance” and “encouraging higher-order thinking.”
These claims were not limited to academic circles. The paper garnered hundreds of citations – 262 in Springer Nature journals alone and more than 500 overall – and attracted nearly half a million readers. It also ranked among the most viewed journal articles, thanks to its steady circulation on social media platforms.
“The authors of the paper made some very interesting claims about the benefits of ChatGPT on learning outcomes,” said Ben Williamson, senior lecturer at the Centre for Research in Digital Education and the Institute for Future Resources at the University of Edinburgh. “It was seen by many on social media as one of the first hard, golden pieces of evidence that ChatGPT, and AI more generally, benefits students.”
But as the work spread, so did doubts about how it reached these conclusions. Williamson pointed out some problems. “In some cases it appears to have used other studies of very poor quality or mixed findings from studies that simply cannot be accurately compared because of the very different methods, populations and samples,” told Ars Technica"It really seemed like a paper that shouldn't have been published in the first place."
There were also fundamental questions about time. The ChatGPT It only became publicly available in late 2022, leaving a narrow window for the production of dozens of rigorous, peer-reviewed studies suitable for meta-analysis. “It is not feasible that dozens of high-quality studies on ChatGPT and learning performance have been conducted, evaluated, and published in this time frame,” Williamson said.
Others have raised similar issues earlier. Ilkka Tuomi, chief scientist at Meaning Processing Ltd., criticized the combination of results across studies that may not be directly comparable. He wrote on LinkedIn that studies like this one run the risk of combining results that are not comparable, leading to conclusions based on unclear or inconsistent results. He also argued that such analyses can give a misleading sense of scientific rigor, as statistical tools can produce results that appear reliable even when the underlying data are weak.
Williamson said that as the study spread on social media, much of its nuance was lost, leaving only the headlines to be widely circulated. He noted that these oversimplified conclusions were reinforced by online users, helping to attract significant attention despite the fact that the underlying research did not fully support these conclusions.
This momentum may last longer than the retraction itself. Researchers who reported or shared the study may not see the update, leaving the core message circulating online – that ChatGPT improves learning outcomes.
The episode comes at a time when schools and universities are still trying to figure out how to respond to productive AI. Some educators are trying to curb misuse, particularly AI-assisted cheating, while tech companies continue to develop features designed to position chatbots as study tools.
For Williamson, the frustration is less about a single piece of work and more about what's behind it. He said the situation is infuriating for researchers trying to understand the true role of AI in education, noting that hype has dominated the discussion in recent years.
On the other hand, there is a lack of serious evidence showing how these tools actually affect teaching and learning.
Although the press releases will range from very select to rare, I said I'd pass...because sometimes the editors hide.

