AI detectors have raised eyebrows recently by claiming that the US Constitution was written by artificial intelligence in 1787.
AI detectors have raised eyebrows recently by claiming that the US Constitution was
written by artificial intelligence in 1787. While this assertion is clearly false, it highlights
a concerning issue: there’s much to desire in AI detectors’ reliability in accurately
identifying AI-generated content.
The heart of the problem lies in the methods employed by these detectors. They use
large language models like ChatGPT, trained on vast amounts of human-written and AI-
generated text, to determine the likelihood of a piece of writing being human- or AI-
authored. Two key metrics used are “perplexity” and “burstiness.”
Formal language = AI content?
Perplexity measures how closely a text aligns with what the AI model has learned during
training. It can accurately identify AI-generated content that closely resembles the
training data.
While this is all fine and dandy, it is problematic when dealing with formal language,
such as the US Constitution.
Burstiness, on the other hand, evaluates the variability in sentence length and structure.
AI-generated content often display more uniformity — a diversion from human writing
which tends to vary in length.
However, these metrics have their limitations. Skilled human writers can produce
content with low perplexity, mimicking the AI-generated style. Similarly, AI models are
becoming more human-like in their writing, rendering burstiness as an unreliable
discriminator.
False positives too high
Studies have shown that AI writing detectors are far from foolproof and perform only
marginally better than random classifiers. They frequently return false positives, leading
to potential misjudgments and unfair accusations against students and writers.
Moreover, these detectors can be easily bypassed through paraphrasing attacks, further
compromising their accuracy.
Amid the concerns, some educators are embracing AI tools like ChatGPT to support
learning, acknowledging that existing detectors are inadequate for detecting AI-
generated content accurately.
Turning detection on its head
In response, one AI detector creator plans to shift their focus away from AI detection
and instead highlight the human touch in content creation. Their aim is to assist
teachers and students in navigating the evolving landscape of AI’s role in education.
The AI writing detection challenge is also complicated by potential biases against non-
native English speakers, leading to higher false-positive rates in their work.
As AI continues to advance, the need for robust safeguards against misinformation and
the appropriate recognition of AI’s involvement in content creation becomes
increasingly evident.
The existing AI detectors’ shortcomings underscore the urgency of developing more
accurate and reliable detection systems. Until such systems are in place, it is crucial to
approach AI-generated content detection with caution, considering the personal cost of
false accusations.