How AI detectors work: perplexity, burstiness, and classifier models explained

Quick take

AI detectors use three main techniques: perplexity scoring, burstiness analysis, and trained classifier models. None of them are perfect. Understanding how they work helps you write text that reads like a human wrote it, because you'll know exactly what triggers a flag.

Perplexity: how surprised is the model?

Perplexity measures how predictable a piece of text is. When a language model reads a sentence and can easily guess the next word, perplexity is low. When the next word is unexpected, perplexity is high.

AI-generated text has consistently low perplexity because language models pick the most statistically likely next token. Human writing is messier. We use odd word choices, interrupt our own thoughts, and throw in references that a model wouldn't predict.

GPTZero built its early detection entirely around perplexity scoring. The tool analyzes text at the sentence level and flags passages where perplexity stays uniformly low across the entire document. A human writer almost always has spikes, moments where their word choice gets weird or specific.

Burstiness: how varied is the rhythm?

Burstiness measures sentence-level variation. Human writers produce "bursty" text. We write a 6-word sentence, then a 35-word sentence, then something in between. AI writes in a narrow band, usually 15 to 20 words per sentence, with remarkably even paragraph structures.

Detectors calculate burstiness by looking at the standard deviation of sentence lengths across a document. Low standard deviation means uniform rhythm. That's an AI signature.

This metric is why adding a few short sentences to AI text can sometimes reduce detection scores. You're artificially increasing the burstiness, making the rhythm look more human.

Classifier models: pattern recognition at scale

The third approach trains a machine learning model on millions of examples of human and AI text. The classifier learns patterns that go beyond perplexity and burstiness: word frequency distributions, syntactic structures, paragraph organization, even punctuation habits.

Turnitin uses a classifier trained on academic writing specifically. It scores each sentence individually, then aggregates a document-level score. Originality.ai runs a similar approach but targets marketing and web content. Copyleaks uses an ensemble of classifiers combined with their plagiarism detection engine.

The problem with classifiers is they're only as good as their training data. When a new model like GPT-4o or Claude comes out, classifiers trained on older output may miss it entirely until they're retrained.

Where all three methods fail

A Stanford HAI study tested seven major detectors on TOEFL essays written by non-native English speakers. The result: 61.22% of those human-written essays were flagged as AI-generated. Non-native speakers tend to write with simpler vocabulary and more uniform sentence structures, which mimics the exact patterns detectors look for.

GPTZero claims 99.3% accuracy with a 0.24% false positive rate on its own benchmarks. But vendor benchmarks test on clean samples where the human text is clearly human and the AI text is raw output. Real-world text is messier. People edit AI drafts, AI assists human writers, and the line blurs.

Paraphrasing tools and AI humanizers specifically target these detection methods. They increase perplexity by swapping predictable words, boost burstiness by varying sentence length, and break the patterns classifiers look for. For a breakdown of which humanizer tools actually work, see our 2026 humanizer comparison.

Watermarking: the emerging fourth method

Some AI providers are experimenting with statistical watermarks embedded in their output. These watermarks subtly bias token selection in a way that's invisible to readers but detectable by the provider's own tools.

OpenAI confirmed it has the capability to watermark GPT output but delayed rollout due to concerns about impact on non-English speakers. Google DeepMind's SynthID is already in production for some Gemini outputs. These watermarks survive light editing but break down when text is substantially rewritten.

Watermarking is harder to defeat than statistical analysis, but it only works if the provider implements it. Open-source models and most API access don't include watermarks.

What this means for your writing

If you're using AI as a writing tool, the detection methods above tell you exactly what to fix. Increase your perplexity by using specific, unexpected words instead of generic ones. Boost your burstiness by mixing sentence lengths dramatically. Break classifier patterns by adding personal voice, opinions, and structural variety.

Or use a tool that does it for you. Run your text through an AI detector first to see your score, then through a humanizer to fix what gets flagged. Training your writing voice into the AI from the start reduces the need for post-processing.

FAQ

Do all AI detectors use the same method?

No. GPTZero emphasizes perplexity and burstiness. Turnitin relies heavily on its classifier model. Originality.ai combines classifier scoring with its own proprietary signals. Each detector weights these methods differently, which is why the same text can score differently across tools.

Can AI detectors identify which model wrote the text?

Most detectors only give a probability score for AI vs. human. A few, like Originality.ai, attempt to identify the specific model, but this identification is unreliable and becomes outdated as new models release.

Does editing AI text change the detection score?

Yes. Even light editing, swapping a few words, adding a personal anecdote, varying sentence lengths, can drop detection scores significantly. Heavy editing or running text through a humanizer tool can reduce scores to near zero. See how to make ChatGPT text undetectable for specific techniques.

Log in to access your workspace