
She Wrote Her Thesis Herself. The AI Detector Disagreed. Here's the Language Bias Nobody Talks About
Sofia had been working on her master's thesis for 18 months. Born in São Paulo, she'd moved to Toronto to study environmental policy. Her English was excellent — she'd been writing academic papers in it for years. But when she submitted Chapter 3 to her supervisor in October 2024, she got an email that made her stomach drop: the department had run it through an AI detection tool, and it had come back 74% AI-generated.
She hadn't used AI. Not even once.
What followed was a two-week investigation that revealed something most students and professors don't know: AI detection accuracy varies dramatically depending on what language you're working in — and even what language you think in.
Why Does AI Detection Accuracy Vary by Language?
AI detection accuracy varies by language primarily because detection models are trained on vastly unequal amounts of text. Most major detectors — including those used by Turnitin and GPTZero — were built and optimized on English-language data. Their statistical baselines, their understanding of what 'sounds human,' and their ability to spot AI patterns all reflect English writing norms. Apply the same model to Spanish, Arabic, or Portuguese text, and the accuracy drops off significantly.
This isn't exactly a bug. It's a training data problem. To understand how AI detectors work, you have to know they're essentially asking: 'Does this text match the statistical patterns of AI-generated content we've seen before?' If they've seen very little non-English AI content — or very little non-English human content — their confidence intervals collapse. They start flagging things they shouldn't, or missing things they should catch.
What Sofia's Investigation Actually Found
Sofia started by running her own tests. She took a paragraph she'd written herself — in her natural academic style, shaped by years of reading Portuguese-language journals — and put it into three different AI detectors. All three flagged it as AI. Then she took an actual ChatGPT-generated paragraph in Portuguese and ran it through the same tools. Two out of three said it was human-written.
The system was broken in both directions.
Her hypothesis: her English writing carries structural patterns from Portuguese academic writing — longer subordinate clauses, a preference for nominal phrases, specific transition patterns. To a detector trained almost exclusively on American academic English, those patterns looked artificial. Too consistent. Too 'patterned.'
Meanwhile, AI text in Portuguese was slipping through because the detectors had fewer examples of what AI Portuguese actually looks like.
She wasn't the only one dealing with AI detection false positives. But her case illustrated a specific, underreported version of the problem: the language bias that hits multilingual writers hardest.
The Numbers Behind the Language Gap
Research published in early 2025 found that popular AI detectors achieve roughly 84–91% accuracy on English text. For Spanish, that drops to around 67–73%. For Arabic and Chinese, some tools perform barely better than chance — accuracy sitting in the 52–58% range.
Several factors drive this gap:
- Training data imbalance — English dominates academic publishing and the open web, so models have far more human English examples to learn from.
- AI output patterns differ by language — ChatGPT writing in French doesn't exhibit the same perplexity patterns as ChatGPT writing in English. The 'fingerprint' shifts.
- Cultural writing conventions get misread — Japanese formal writing has highly consistent structural patterns that can trigger false positives in systems that treat 'consistency' as a red flag.
- Update lag for non-English models — Detectors refresh their models constantly for English-language AI output. Updates for non-English variants lag behind by months, sometimes longer.
How Sofia Resolved It
She did three things. First, she documented her writing process: saved her research notes, draft history, and browser tabs from the days she wrote that chapter. If she needed to prove her essay was human-written, she wanted evidence ready.
Second, she ran her chapter through WriteMask. Not to deceive anyone — but because she recognized her prose had structural patterns that were genuinely tripping up English-first detectors. WriteMask adjusted the sentence rhythms and phrasing without flattening her voice or changing her argument. The chapter came back clean on the first pass. WriteMask carries a 93% pass rate on major detection tools, and in Sofia's case, it worked exactly as intended: legitimate editing that made her English writing read as the human work it actually was.
Third, she brought her findings to her supervisor — not just her own case, but the published accuracy data. Her department has since added a mandatory human review step before any AI detection result can factor into a disciplinary process.
What to Do If You're a Multilingual Writer Getting Flagged
If you write in English but process ideas in another language, or if you regularly translate your thinking across linguistic registers, here's what matters:
- Run your text through a free AI detector before submitting — not to cheat, but to catch false positives before your professor does.
- Keep a writing trail: notes, drafts, timestamps. This matters more than most students realize.
- Understand that your writing style is not wrong. The detector is making bad assumptions shaped by English-only training data.
- Tools like WriteMask can help your prose read more naturally in English without stripping your ideas — that's editing, not deception, and it's exactly what native English writers do without thinking twice.
Sofia's story isn't rare. It plays out every semester at universities across Australia, the UK, Canada, and the US. The bias isn't intentional. But the harm it causes to international students — already navigating enormous pressure and institutional skepticism — is real.
The language gap in AI detection is one of the most underreported problems in academic technology right now. And until the tools catch up, the burden falls on the writers who shouldn't have to prove themselves twice.