Key Takeaways
- EssayHero now supports university-level law essay assessment with four discipline-specific criteria scored 0-25 each (total 0-100)
- The assessment is built around legal reasoning conventions (IRAC, case law deployment, statutory interpretation) rather than generic "good writing" metrics
- The AI is explicitly instructed not to verify whether cited cases exist, because it can't access legal databases
- This is a formative feedback tool, not a substitute for lecturer marking
You're Probably Sceptical. Good.
If you're a law lecturer reading this, you've likely had the same reaction most academics have when someone says "AI essay feedback": a mix of wariness and mild irritation. You've spent years developing the judgement to assess legal reasoning. You know the difference between a student who understands Donoghue v Stevenson and one who can merely name it.
The idea that software could do what you do is, at best, implausible. I'm not going to argue with that. EssayHero can't do what you do.
But it might be able to do something useful alongside what you do. And if you're going to consider recommending it to your students, you deserve to know exactly how it works, what it looks for, and where it falls short.
Who Built This and Why
I'm Joseph Lin. I've been marking essays for over twenty years, from primary school through to PhD dissertations.
I built EssayHero originally for HKDSE students in Hong Kong who weren't getting enough feedback between assignments. It's free, it has no commercial aims, and it's expanded to university level because lecturers asked for it.
A law professor friend asked me two pointed questions: "How is this different from your exam-board configurations?" and "What are your parameters for good writing?" This post answers both.
How University Assessment Differs from Exam-Board Marking
Standardised Exams: Fixed Criteria and Public Rubrics
EssayHero's original configurations were built for standardised exams: HKDSE, IELTS, Cambridge IGCSE. These exams have published marking criteria, official band descriptors, and examiner-graded exemplar essays.
The AI's job is to apply those criteria consistently. The criteria are fixed, the mark schemes are public, and calibration is straightforward because you can validate against official scores.
University Law: A Different Problem
University law essays are a different problem. There is no single published rubric that every law school uses. Expectations vary by institution, by module, and by lecturer.
A first-year contract law problem question and a third-year jurisprudence essay require fundamentally different skills. The "right answer" isn't a band on a scale but a demonstration of legal thinking.
Building Law-Specific Criteria
So building a university law configuration meant starting from different assumptions. Instead of replicating an exam board's mark scheme, we built assessment criteria around what experienced law lecturers consistently look for across institutions:
- Legal reasoning quality — How the student identifies issues and applies rules
- Use of authority — Case law and statutory citations in context
- Argumentation structure — Thesis development and logical flow
- Academic writing precision — Legal terminology and clarity
These aren't generic. They're informed by how law is actually taught and assessed in common law jurisdictions.
Percentage-Based Scoring
The scoring scale is different too. Instead of HKDSE's 1-7 per criterion or IELTS's 0-9 bands, law essays are scored 0-25 on each of four criteria, totalling 0-100. This mirrors the percentage-based marking that most university law faculties use.
What the AI Actually Looks For
The four criteria are Legal Reasoning and Analysis, Research and Authority, Structure and Argumentation, and Academic Writing. Each has five bands with detailed descriptors.
1. Legal Reasoning and Analysis
This assesses whether the student can identify legal issues, state relevant rules accurately, and apply those rules to the specific facts or question at hand.
The AI is tuned to recognise the IRAC method and its variants, but it doesn't penalise students who use a different analytical framework, provided the substance is there. It distinguishes between:
- Problem questions — Expects issue-spotting and systematic analysis
- Discursive essays — Expects a thesis and sustained argumentation
At the top of the scale (21-25), it looks for students who can identify tensions in the law, critique judicial reasoning, or engage with competing analytical approaches.
2. Research and Authority
This evaluates how the student uses legal sources. And this is where I need to be direct about something.
Critical Limitation: We Cannot Verify Citations
The AI cannot verify whether cited cases are real. It has no access to Westlaw, LexisNexis, or any legal database. We explicitly instruct it not to evaluate whether individual references exist, because doing so would produce unreliable results and give students false confidence (or false alarm).
Instead, it assesses how sources are used:
- 21-25 level — Students distinguish between ratio decidendi and obiter dicta, deploy case law to support specific analytical points rather than as general background
- 11-15 level — Reliance on the recommended textbook with little evidence of independent research
The AI can tell the difference between a citation that's analytically integrated and one that's merely decorative.
3. Structure and Argumentation
This looks at the logical progression of the essay:
- Introduction — Does it identify the question's scope and state a thesis?
- Body paragraphs — Does each paragraph advance the argument purposefully?
- Counterarguments — Are they engaged with substantively rather than merely acknowledged?
- Conclusion — Does it synthesise the analysis rather than introducing new material?
For problem questions, it assesses whether issues are addressed in a logical order with clear separation. For discursive essays, it looks for a systematic argument that builds rather than a sequence of loosely connected descriptions.
4. Academic Writing
This covers the mechanics:
- Precision of legal terminology
- Appropriate register
- Sentence variety
- Grammar and clarity
It rewards concise expression of complex ideas and penalises unnecessary verbosity and jargon.
Citation Formatting
The AI accepts both British and American English conventions and does not comment on citation formatting. Whether a student uses OSCOLA or Bluebook is irrelevant to the substantive assessment.
The Feedback Tone
One deliberate choice worth mentioning: the AI provides feedback in the voice of a collegial peer reviewer, not an authoritative examiner.
Examples:
- "Consider strengthening this section" rather than "You should have included"
- "The argument could be extended by" rather than "You failed to"
This was intentional. Students respond better to constructive suggestion than to top-down correction, and the feedback is more useful when it points toward specific improvements rather than merely cataloguing faults.
What We Can't Do
This is the section that matters most, so I'll be direct.
We Can't Verify Legal Citations
The AI doesn't know whether Smith v Jones [2020] is a real case or one the student invented. It can assess whether the case is used effectively in the argument, but not whether it exists.
If citation accuracy matters (and in law, it always does), that's something a human needs to check. We instruct the AI to stay silent on this rather than guess, because a wrong guess in either direction is worse than no comment at all.
We Can't Assess Legal Knowledge Depth
The AI can evaluate how a legal argument is constructed and presented, but it doesn't have the domain expertise to know whether a student's understanding of, say, the Caparo test is actually correct in its nuance.
It follows structured assessment criteria well. It doesn't "understand" law in the way you do after years of study and practice.
We Can't Replace Summative Marking
EssayHero's scores are indicative, not definitive. If you give an essay 58 and EssayHero gives it 72, you are right.
The AI is applying generalised criteria without knowing your module's specific expectations, your institution's marking conventions, or the particular learning outcomes you've set. The scores are useful as a rough benchmark for students working between drafts, not as a predictor of the mark they'll receive.
For Lecturers
If there's a discrepancy between your mark and EssayHero's score, your mark is right. The AI doesn't know your module's specific expectations or your institution's conventions.
We Can't Check Jurisdiction-Specific Accuracy
The AI evaluates legal reasoning within whatever jurisdictional framework the student presents, but it doesn't have the expertise to flag when a student misapplies a statute from the wrong jurisdiction or confuses English and Australian common law on a particular point.
This Is Formative, Not Summative
The tool is designed for the gap between drafts, not for final assessment. It's useful in the same way that a study group is useful:
- Gives you another perspective
- Catches structural weaknesses
- Forces you to articulate your argument clearly
But it's not a marker. It's a practice partner.
What It Is Good For
After that list of limitations, you might reasonably ask: so what's the point?
Faster Iteration Cycles
The point is faster iteration. A student working on a problem question at 11pm can submit a draft, get paragraph-by-paragraph feedback on their IRAC structure, identify a weak application section, revise it, and bring a better draft to your office hours.
That revision cycle is where the learning happens, and most students don't get enough of it because feedback is scarce and slow.
Consistent Criteria Application
The criteria don't change based on workload or mood. If a student submits the same essay on a Monday and a Friday, the feedback will be consistent.
That's useful for building a student's understanding of what the criteria actually mean, even if the scores themselves are rough estimates. Identifying specific weaknesses that a student can fix before submission:
- Vague thesis statements
- Superficial counterargument engagement
- Decorative rather than analytical case law usage
Calibrated Strictness Modes
The strictness modes let students calibrate their expectations:
- Lenient — Benefit of the doubt, focuses on strengths
- Baseline — Standard marking criteria
- Harsh — Rigorous standards where scores of 21-25 are reserved for genuinely excellent work
A student who scores well on harsh mode has reason to feel confident. One who struggles on lenient mode knows there's significant work to do.
For Students
None of this replaces your lecturer's feedback. But it might mean that when you do come to office hours, you've already caught the structural and argumentative weaknesses you could have found yourself.
Full Transparency
Open Source Assessment Criteria
The complete assessment criteria, the detailed band descriptors, and the full instructions that the AI receives are published in the source code. EssayHero is open source under AGPL-3.0.
You can read every line of the prompt configuration and decide for yourself whether the standards align with what you'd expect.
Privacy First
Essays are processed and discarded. They are:
- Not stored
- Not used for model training
- Not accessible to anyone after the feedback is generated
If a student is logged in, they can choose to save their analysis to their own account, but that's opt-in. Privacy matters, and in an academic context it matters more than usual.
Try It Yourself
EssayHero is free. No account required.
If you want to see what the feedback looks like on a law essay, go to essayhero.app/?exam=uni-law, paste a sample essay, and read the output. Then decide whether it's something worth sharing with your students.
Your Feedback Matters
If you think it could help them iterate faster between drafts, share it. If you think the criteria don't align with your expectations, or the feedback isn't useful, I'd genuinely like to hear why.
Email hello@essayhero.app.
I built this to help students write better. If it can do that for your students, I'm glad. If not, I understand.
EssayHero is free, has no commercial aims, and is built by a Hong Kong teacher for students worldwide. Questions? Email hello@essayhero.app.
Related Articles
How EssayHero Marks Business Essays (And What It Can't Do)
A transparent look at how EssayHero assesses university business and management essays, what the criteria actually measure, and where AI falls short.
Read moreHow EssayHero Marks Psychology Essays (And What It Can't Do)
A transparent look at how EssayHero assesses university psychology essays, what the criteria actually measure, and where AI falls short.
Read moreHow EssayHero Marks HKDSE Paper 2 Essays (And Why You Should Know)
A transparent explanation for teachers and tutors of how EssayHero assesses HKDSE English Paper 2 writing, how scoring works, and where AI falls short.
Read more