61% of ESL Essays Get Flagged as AI — Here's Why Multilingual Writers Are Hit Hardest — WriteMask AI Humanizer
EducationMay 13, 2026

61% of ESL Essays Get Flagged as AI — Here's Why Multilingual Writers Are Hit Hardest

In 2023, researchers at Stanford ran a simple experiment. They fed essays written by non-native English speakers into popular AI detectors — no AI involved, just real human writing. The result was alarming: GPTZero flagged 61.3% of those essays as AI-generated. For native English speakers writing on the same topics? The false positive rate dropped to around 17%.

That gap isn't a glitch. It reflects something structural about how AI detection works — and why multilingual writers carry a burden that most English-first students never have to think about.

Why Do AI Detectors Flag Non-Native Writers More?

AI detectors flag text based on two main signals: perplexity (how unpredictable the word choices are) and burstiness (how much sentence length varies). Text that scores low on both looks, statistically, like AI output. To understand how AI detectors work at a technical level, the core issue is this: their training data skews heavily toward fluent, native-style English prose as the baseline for "human."

Non-native writers tend to use simpler, safer vocabulary — words they're confident about rather than risky synonyms. Their sentence structures are often more consistent, more grammatically conservative. They avoid idiomatic leaps. All of this looks, to a pattern-matching algorithm, suspiciously like a language model playing it safe.

In other words: the very strategies that help ESL writers avoid grammar mistakes are the same strategies that get them flagged as bots.

Which Languages and Writers Are Most Affected?

The Stanford study (Liang et al., 2023) specifically looked at TOEFL essays — standardized writing samples from multilingual test-takers. These are not rough drafts. They are polished, high-stakes essays written by people who have spent years learning English. And still, 6 in 10 were misclassified.

Languages that structure sentences differently from English — Chinese, Arabic, Korean, Spanish — often produce translated-style prose even when the writer composes directly in English. The syntactic habits of a first language bleed through. Short declarative sentences. Subject-verb-object clarity. Minimal hedging language. Precise but limited vocabulary range.

A 2023 meta-analysis by Weber-Wulff and colleagues tested 14 AI detection tools and found that none achieved consistent accuracy above 80% on non-native writing samples, with several performing worse than random chance on certain language backgrounds. This matters because most academic institutions use these tools as if they were reliable.

The Turnitin Problem Is Especially Acute

Turnitin's AI detection rolled out in 2023 and rapidly became the standard in universities across the UK, Australia, and North America. But Turnitin itself acknowledges in its documentation that the tool should not be used as the sole basis for an academic integrity decision. That caveat rarely reaches students.

For multilingual writers, the stakes compound. AI detection false positives are painful for any student, but for international students writing in a second or third language, the accusation carries an extra layer of injustice. They're being penalized not for cheating, but for writing differently — for being who they are linguistically.

If you've been accused based on a detector flag, knowing what to do if accused of using AI is essential. Document your drafts, use your notes, and push back with evidence.

What Can Multilingual Writers Actually Do?

The frustrating reality is that improving your English writing — becoming more fluent, more varied, more idiomatic — naturally moves you away from the patterns detectors flag. But that takes time most students don't have before a deadline.

Practically speaking, here's what helps:

  • Vary your sentence length deliberately. Follow a short punchy sentence with a longer, more complex one. This raises your burstiness score, one of the key signals detectors use.
  • Use a wider vocabulary range. Tools like a thesaurus or even ChatGPT (used for suggestions, not writing) can help you find synonyms you'd feel confident using.
  • Read your draft aloud. If every sentence sounds the same rhythm, restructure a few.
  • Run your text through a free AI detector before submitting. Know your score. If it's flagging high, you have time to revise.
  • Use a humanizer tool designed for this problem. WriteMask was built to adjust exactly the perplexity and burstiness signals that detectors measure, and it achieves a 93% pass rate across major detectors including Turnitin and GPTZero.

Is This Getting Better or Worse?

Detection tools are improving, but they're improving on the wrong axis — getting better at catching AI output that mimics fluent English, not getting better at distinguishing non-native human writing from AI. The economic incentive is to reduce false negatives (catching more AI), not false positives (protecting more humans).

Until that changes, multilingual writers are stuck navigating a system that wasn't designed with them in mind. Understanding the bias is the first step. Preparing for it — checking your work, knowing your rights, using the right tools — is the practical response.

The 61% statistic isn't just a data point. It's a lot of real students, writing in their second or third language, getting an accusation they don't deserve.

Frequently Asked Questions

Do AI detectors discriminate against non-native English speakers?

Yes, research shows AI detectors flag non-native English writing at significantly higher rates. A 2023 Stanford study found GPTZero misidentified 61.3% of ESL essays as AI-generated, compared to roughly 17% for native English speakers, because non-native writing patterns — limited vocabulary range, consistent sentence structure — statistically resemble AI output.

Why does Turnitin flag ESL writing as AI?

Turnitin's AI detector measures perplexity (word unpredictability) and burstiness (sentence length variation). ESL writers often use safer, simpler vocabulary and more consistent sentence structures to avoid grammar errors — which unintentionally matches the low-perplexity, low-burstiness profile of AI-generated text, triggering false positives.

What can multilingual writers do to avoid AI detection flags?

Multilingual writers can reduce false flags by deliberately varying sentence length, expanding vocabulary range, and using an AI detector to check their score before submitting. Tools like WriteMask can also help adjust the specific linguistic signals detectors measure, achieving a 93% pass rate on major platforms including Turnitin.

Which AI detectors are worst for non-native English speakers?

Research by Weber-Wulff et al. (2023) found that most major AI detectors perform poorly on non-native writing, with none achieving consistent accuracy above 80% across different language backgrounds. GPTZero, Turnitin, and Copyleaks all showed elevated false positive rates for ESL writers in independent testing.

Try WriteMask free

500 words/day. No credit card required. Paste AI text and see the difference.