Key Takeaways
- Your vocabulary score combines three things: diversity, sophistication, and repetition
- Diversity means using varied words, not the same ones over and over
- Sophistication means using precise words, not necessarily long ones
- Readability (FK grade) tells you whether your sentences are too simple or too tangled
- These are rough guides. They point you in the right direction, not to an exact destination.
Why Measure Vocabulary at All?
From marking thousands of student essays, I've noticed that vocabulary is one of the hardest things for students to self-assess. You know when your grammar is off because someone underlines the mistake. But vocabulary problems are subtler.
A student might write a perfectly grammatical essay that still feels flat, repetitive, or vague. The markers notice. They just don't always explain what went wrong.
That's what the writing metrics in EssayHero try to address. When you submit an essay, you'll see progress bars for vocabulary, readability, and sentence length. Each one has a "target zone" showing where you want to be.
Vocabulary Diversity (MTLD)
The first component is something called MTLD, which stands for Measure of Textual Lexical Diversity. In plain terms: how many different words do you use relative to how many words you write?
If you write 300 words but keep cycling through the same 40, your MTLD will be low. If you use a wider range of words naturally, it goes up.
Why MTLD Instead of Just Counting Words?
We use MTLD rather than simpler measures (like just counting unique words) because it handles different essay lengths fairly. A 600-word essay isn't penalised for having more total words than a 300-word one.
The algorithm, developed by McCarthy and Jarvis in 2010, breaks your text into segments and checks how quickly your vocabulary starts repeating within each segment. It's been validated across thousands of texts in academic research.
What the Numbers Mean
| MTLD Score | Interpretation |
|---|---|
| Below 60 | Your word choices are quite repetitive. This is the most common issue I see. |
| 60-79 | Developing. You're using some range but could push further. |
| 80-100 | Good diversity. This is the target zone. |
| Above 100 | Excellent. You're using a genuinely wide range of vocabulary. |
Sophistication: Precise, Not Fancy
The second component is vocabulary sophistication. This is often misunderstood.
What Sophistication Really Means
Sophistication doesn't mean using the longest words you can find. It means using words that are precise and appropriate rather than vague and generic.
We measure this by checking your word choices against frequency lists. Words that most English speakers use every day (basic vocabulary) score lower. Words that appear in academic texts and professional writing (but aren't obscure) score higher.
An Example of Precision vs Decoration
Consider these two sentences:
"The government should do something about the problem of pollution."
"The municipality should implement stricter emissions regulations."
The second sentence isn't trying to impress anyone with fancy language. It's just more precise:
- "Municipality" is more specific than "government"
- "Implement stricter emissions regulations" tells you exactly what action is proposed, rather than the vague "do something about the problem"
That's what sophistication measures. Not decoration — precision.
How We Calculate It
The scoring uses word frequency bands based on the New General Service List and the New Academic Word List:
- Basic words (the most common 1,000 in English) get the lowest weighting
- Intermediate and advanced words get progressively more
- Weightings are tuned per exam — IELTS, for instance, places more emphasis on sophistication because the Band 7 descriptors explicitly mention "less common lexical items"
Repetition
The third component checks whether you're overusing specific content words.
We ignore common function words — "the", "is", "and" — because everyone repeats those and it's fine. Instead, we look at content words (nouns, verbs, adjectives, adverbs) that appear three or more times.
If you write "technology" eleven times in a 400-word essay about technology, the repetition score drops. It's a signal that you should find synonyms or rephrase: "digital tools", "innovations", "these developments".
Readability (Flesch-Kincaid)
Separate from the vocabulary score, you'll see a readability metric. This uses the Flesch-Kincaid grade level formula, which looks at your average sentence length and average number of syllables per word.
The target zone for HKDSE and IELTS is FK 10-14, which roughly corresponds to Form 4-6 reading level.
What the FK Score Tells You
- Below FK 8 suggests your sentences are too simple — short, choppy, using only basic words
- Above FK 16 usually means your sentences are getting tangled: too long, too many subordinate clauses, hard to follow
The Target Zones on the Progress Bars
| FK Grade Level | Interpretation |
|---|---|
| Below 8 | Too basic for exam writing |
| 8-10 | Developing. Your sentences need more complexity. |
| 10-14 | Appropriate for HKDSE and IELTS |
| 14-16 | Advanced. Fine if your writing is still clear. |
| Above 16 | Check for clarity. Long sentences aren't always better. |
Sentence Length (MLT)
You'll also see a metric for mean sentence length, measured in words per sentence.
Research by Hunt (1970) and Lu (2011) found that skilled secondary students typically write sentences of 14-18 words on average:
- Below 11 words suggests choppy, underdeveloped writing
- Above 22 words is a warning sign for run-on sentences
The Average Matters More Than Any Single Sentence
This doesn't mean every sentence should be 16 words. The best essays mix short punchy sentences with longer, more complex ones. It's the average that matters.
What These Numbers Can't Tell You
I want to be honest about the limitations.
Metrics Are Guides, Not Guarantees
These metrics are statistical approximations, not judgements. A high vocabulary score doesn't guarantee good writing. You could use varied, sophisticated words and still write an incoherent essay.
Similarly, a lower readability score doesn't always mean your writing is too simple — it might mean you're being admirably concise.
Known Limitations
MTLD needs sufficient text. It needs about 50 words before it's reliable, and it works best on texts of 200 words or more. On very short texts, take the number with a larger grain of salt.
Sophistication doesn't understand context. It can't tell whether your word choice is appropriate for the text type. A formal report and a personal letter need different registers, and the metric treats them the same.
Think of These as a Compass, Not a GPS
They'll point you in the right direction:
- "You're repeating yourself"
- "Your sentences are all the same length"
- "Try more precise vocabulary"
But the real improvement happens when you act on those signals through deliberate practice.
What to Do Next
If your vocabulary score is below the target zone, try the Vocabulary Enhancement tool. It takes a paragraph from your essay and suggests more precise alternatives, showing you exactly where your word choices could be stronger.
Don't Fake Sophistication
Don't fall into the trap of stuffing your essays with impressive-sounding words you barely understand. Markers can tell. The goal is always appropriate precision: saying exactly what you mean, with the right word, in the right place.
EssayHero is free and built by a Hong Kong teacher for students. Questions? Email hello@essayhero.app.
Related Articles
Mark an Entire Class in Minutes: EssayHero's Teacher Marking Suite
Upload one PDF, get every essay marked. Score overrides, custom rubrics, class analytics, and full export — built for teachers.
Read moreHow EssayHero Marks IELTS Essays (And What It Can't Do)
A transparent look at how EssayHero assesses IELTS Writing Task 1 and Task 2 essays, what the four criteria actually measure, and where AI falls short of a real IELTS examiner.
Read moreHow to Improve Your HKDSE English Paper 2 Score
A veteran DSE marker shares what actually gets marks in Paper 2 — and what most students get wrong.
Read more