Between plagiarism and character assassination: How good are AI detectors?

Ahmed Riaz

3 hours ago

According to a study, nine percent of all newspaper articles in the USA already contain AI-generated content. AI detectors like Pangram or GPTZero promise to reliably debunk such texts. But there is a gap between advertising promises and reality that could become a problem for editors and readers alike. A commentary analysis.

How do AI detectors work?

Researchers at the University of Maryland examined over 180,000 articles from 1,500 newspapers in 2025 and identified an AI share of around nine percent. The conclusion: Artificial intelligence is now a thing an integral part of media and publishing houses. A number of cases have also recently caused a stir in Germany. The Tagesspiegel, for example, ended its collaboration with its former editor-in-chief and publisher Stephan-Andreas Casdorff after he was convicted of writing his texts with AI without making this known. Axel Springer boss Mathias Döpfner once again published a 100 percent AI text, but made it identifiable. The criticism: Döpfner would outsource or give up thinking.
AI detectors promise to be able to recognize artificially generated texts as such. The best-known tools include Pangram, GPTZero, Copyleaks, Originality AI and Scribbr. As the US magazine The Atlantic reports, these detectors are now effective enough for widespread use, but not reliable enoughto trust them completely. There was even a real Pangram problem in the USA. In addition to false suspicions, there is an arms race between AI chatbots and AI detectors, both of which are not useless, but are just a mirror image of the other. Pangram was also used to analyze texts from Digital Minister Carsten Wildberger and the Thuringian Prime Minister Mario Voigt.
AI text checking tools examine mass amounts of content for specific patterns and then compare texts with these patterns. According to its own information, Pangram has a hit rate of 99.98 percent. Other providers promise a comparably high level of accuracy. However, Pangram promises a lower error rate through regular training. According to a University of Chicago test last year, Pangram explained almost none of the 3,000 tested Texts incorrectly identified as AI-generated. In our own tests, only one in ten thousand texts was false positive.

Why high hit rates are misleading

The search for one AI fingerprint in texts is becoming more and more difficult. And that is no coincidence. Because language models are trained with billions of human texts. Although they do not copy fixed formulations, they do copy statistical patterns of styles, sentence structure and speech rhythms. This sometimes leads to bizarre suspicions. Authors, for example, are suddenly being targeted because they often use dashes or write unusually even sentences without using AI. This borders on character assassination!

AI texts can now also be specifically revised later or rewritten using specially developed tools to sound less like AI. The percentage detection values of many AI detectors are beyond 90 percent also promote misunderstanding. Because: A test that almost always works, recognizes almost all AI texts, but also classifies those that are not as positive.

What is important is not only how often a detector is right, but also how often it is wrong and how many AI texts it misses. However, the providers do not include these values in their detection rate. Pangram, for example, also provides numbers for this, but rather in the small print. Including false-negative scans, the rate is only 85 percent. In other words: every seventh to eighth AI text remains undetected.

Other tools like GPTZero expose a little more artificial content, but also buy this advantage through false suspicions. The irony is that the same detectors are now also being used to specifically rewrite texts so that they pass future tests. Say: That The arms race has long been taking place on both sides.

Voices

Bradley Emi and Max Spero, the founders of Pangramon the topic of ethics and responsible use of their detector in a 2024 technical report: “All AI detection tools have a non-zero false positive rate and should be used in conjunction with other evidence to confirm or refute plagiarism. AI detection is neither a replacement nor a reliable tool for proving the factuality or accuracy of textual information such as news and media content.”
Mika Beuster, Federal Chairman of the German Journalists Associationin a statement: “The current discussion about artificial intelligence in journalistic texts shows that the credibility of journalism is at stake. It’s about transparency and not about demonizing a new technology, because AI can be a helpful tool for media professionals when doing research, for example.”
Danica Bensmail, Federal Managing Director of the German Journalists’ Union (dju)takes a similar approach: “Publishers and media companies have a responsibility in terms of press ethics, even if content is created with the help of artificial intelligence. Anyone who acts without rules risks their own livelihood. AI may only be used to the extent that it has been bindingly agreed between editorial teams and the publisher. Publishing management that uses AI without rules or ignores existing user agreements devalues journalistic work.”

Mandatory labeling instead of AI detectors

Bizarre: The providers themselves sometimes limit expectations of their AI detectors. This means that many providers are left behind despite big advertising promises in the small print a small back door is open – also for legal reasons. Nevertheless: AI detectors can provide evidence of artificially generated content, but cannot replace journalistic or scientific evidence. Because anyone who derives a judgment from percentage values confuses probability with certainty.

For the media, the debate is now shifting away from pure technical detection towards greater transparency. There are mandatory labels for the use of AI in journalistic articles long overdue – for example about the press code. Because: It’s all about credibility and the threat of a loss of trust, which could have significant consequences for one or another medium.

But there is also a need for education and guidelines regarding the use of AI detectors. Algorithms and probabilities No detective work can be left to thisas they do not provide any certainty. Rather, one or the other could cut themselves in the flesh through mistaken suspicions. In other words: Both AI texts and AI detectors that are supposed to recognize such texts can provide false answers that should not result in a witch hunt.

At the EU level, there is already increasing pressure to mark AI texts with proposed symbols, unless an editorial review by a responsible party has taken place. Ultimately, however, no law or detector will help users or editorial teams AI texts can undoubtedly be recognized. However, media that rely on labels, editorial guidelines and transparency could be among the beneficiaries.

Also interesting:

Source link