
61% of ESL Essays Get Flagged as AI — Here's Why Multilingual Writers Are Hit Hardest
In 2023, researchers at Stanford ran a simple experiment. They fed essays written by non-native English speakers into popular AI detectors — no AI involved, just real human writing. The result was alarming: GPTZero flagged 61.3% of those essays as AI-generated. For native English speakers writing on the same topics? The false positive rate dropped to around 17%.
That gap isn't a glitch. It reflects something structural about how AI detection works — and why multilingual writers carry a burden that most English-first students never have to think about.
Why Do AI Detectors Flag Non-Native Writers More?
AI detectors flag text based on two main signals: perplexity (how unpredictable the word choices are) and burstiness (how much sentence length varies). Text that scores low on both looks, statistically, like AI output. To understand how AI detectors work at a technical level, the core issue is this: their training data skews heavily toward fluent, native-style English prose as the baseline for "human."
Non-native writers tend to use simpler, safer vocabulary — words they're confident about rather than risky synonyms. Their sentence structures are often more consistent, more grammatically conservative. They avoid idiomatic leaps. All of this looks, to a pattern-matching algorithm, suspiciously like a language model playing it safe.
In other words: the very strategies that help ESL writers avoid grammar mistakes are the same strategies that get them flagged as bots.
Which Languages and Writers Are Most Affected?
The Stanford study (Liang et al., 2023) specifically looked at TOEFL essays — standardized writing samples from multilingual test-takers. These are not rough drafts. They are polished, high-stakes essays written by people who have spent years learning English. And still, 6 in 10 were misclassified.
Languages that structure sentences differently from English — Chinese, Arabic, Korean, Spanish — often produce translated-style prose even when the writer composes directly in English. The syntactic habits of a first language bleed through. Short declarative sentences. Subject-verb-object clarity. Minimal hedging language. Precise but limited vocabulary range.
A 2023 meta-analysis by Weber-Wulff and colleagues tested 14 AI detection tools and found that none achieved consistent accuracy above 80% on non-native writing samples, with several performing worse than random chance on certain language backgrounds. This matters because most academic institutions use these tools as if they were reliable.
The Turnitin Problem Is Especially Acute
Turnitin's AI detection rolled out in 2023 and rapidly became the standard in universities across the UK, Australia, and North America. But Turnitin itself acknowledges in its documentation that the tool should not be used as the sole basis for an academic integrity decision. That caveat rarely reaches students.
For multilingual writers, the stakes compound. AI detection false positives are painful for any student, but for international students writing in a second or third language, the accusation carries an extra layer of injustice. They're being penalized not for cheating, but for writing differently — for being who they are linguistically.
If you've been accused based on a detector flag, knowing what to do if accused of using AI is essential. Document your drafts, use your notes, and push back with evidence.
What Can Multilingual Writers Actually Do?
The frustrating reality is that improving your English writing — becoming more fluent, more varied, more idiomatic — naturally moves you away from the patterns detectors flag. But that takes time most students don't have before a deadline.
Practically speaking, here's what helps:
- Vary your sentence length deliberately. Follow a short punchy sentence with a longer, more complex one. This raises your burstiness score, one of the key signals detectors use.
- Use a wider vocabulary range. Tools like a thesaurus or even ChatGPT (used for suggestions, not writing) can help you find synonyms you'd feel confident using.
- Read your draft aloud. If every sentence sounds the same rhythm, restructure a few.
- Run your text through a free AI detector before submitting. Know your score. If it's flagging high, you have time to revise.
- Use a humanizer tool designed for this problem. WriteMask was built to adjust exactly the perplexity and burstiness signals that detectors measure, and it achieves a 93% pass rate across major detectors including Turnitin and GPTZero.
Is This Getting Better or Worse?
Detection tools are improving, but they're improving on the wrong axis — getting better at catching AI output that mimics fluent English, not getting better at distinguishing non-native human writing from AI. The economic incentive is to reduce false negatives (catching more AI), not false positives (protecting more humans).
Until that changes, multilingual writers are stuck navigating a system that wasn't designed with them in mind. Understanding the bias is the first step. Preparing for it — checking your work, knowing your rights, using the right tools — is the practical response.
The 61% statistic isn't just a data point. It's a lot of real students, writing in their second or third language, getting an accusation they don't deserve.