Comparative Judgement: You Be The Judge

Reading time: 7


In 2010, Ross Morrison McGill founded @TeacherToolkit from a simple Twitter account through which he rapidly became the 'most followed teacher on social media in the UK'. In 2015, he was nominated as one of the '500 Most Influential People in Britain' by The Sunday...
Read more about @TeacherToolkit

What is comparative judgement?

Can you hear it? Recently, there seems to be a lot of noise coming from the movers and shakers, talking heads and education influencers about comparative judgement (CJ). This is interesting as CJ has been around for donkey’s years, yet somehow is starting to feel new, ‘novel’, fresh and even revolutionary. Why do we still use an 18th Century approach to assessing exam answers when we have a 21st Century alternative?


Way back in 2005, I was involved in comparative judgement research (Project e-Scape) with Professor Richard Kimbell of Goldsmiths College and Alastair Pollitt of Cambridge University – the initial fruits of our work have since been adapted by the QCA (Qualifications Curriculum Authority) and are now embedded deep into various assessments more than you know. Examination bodies include OCR, Edexcel, AQA.

One could argue that we were 90 years too late. Psychologist, Louis Thurstone, published a paper on the law of comparative judgment in 1927! His method of comparative judgement exploits the power of adaptivity; in scoring rather than testing. The judge [or teacher] is asked only to make a valid decision about quality and therefore “offers a radical alternative to the pursuit of reliability through detailed marking schemes”.

So, firstly, I am delighted that this approach is being championed and receiving more noise – it can significantly reduce workload and enables teachers to moderate well beyond the walls of their own school building. Think of the professional development opportunities too! But, is it the future of moderation? Well, it’s certainly been a past master and it is making itself heard again in the here and now as an “innovation that really works” and an assessment system that “goes places traditional marking cannot reach.”

Comparative judgement has largely moved back into the learning conversations of teachers because of the profession’s Moriarty-like nemesis: workload.

In a review of the evidence of written marking, the Education Endowment Foundation report, A marked improvement? noted; “there is an urgent need for more studies so that teachers have better information about the most effective marking approaches.”


Many teachers spend an inordinately large amount of time marking, with limited evidence surrounding the impact of both the type of feedback given and the time spent on producing the feedback. Marking tests and exams with their convoluted marking schemes that drown you in detail are time-stealers extraordinaire and defective too. If only we could make assessments in 15 seconds with 0.91 Reliability! There has to be a quicker, sharper and more reliable approach to student assessment than ‘traditional’ marking. And there is…

Say hello to comparative judgement, described by some as ‘an elegant solution’ to teacher decision-making, an assessment approach that promises radical changes to the way we evaluate students’ work. At the heart of CJ is trust and trusting teacher’s professional judgements – halleluiah!

Like formative assessment, CJ is neither widespread, nor deeply embedded and is under-used but ‘abolish marksism’ is gaining traction and symbolises a ‘radical departure’ in assessment that could “free up teaching in quite a profound way.”

Compare and Contrast

Whilst CJ might not be cutting-edge, its re-popularisation and high-profile is very welcome in an era of unprecedented teacher pressures where assessment effortlessly distorts the curriculum. It has to be what Dr Chris Wheadon calls a “fine balance between efficiency and reliability.

So, what is it?

If we strip CJ back to the bare bones then it is where teachers are presented with pairs of student work and they are asked to choose which of the two is better.

You can do this a couple of ways:

  • The unsophisticated low-tech approach is to spread work over a table and move them around like a sliding puzzle until you have them in a rank order you are happy with. We typically see departments doing this during moderation of coursework when work is put out on display (e.g. art department) for moderation.
  • Or you could use a hi-tech algorithm which uses several teacher assessments to arrive at a rank order and provides a score for each student. As Dr Steve Draper explains, “Software assembles these pairwise judgements into a quantitative interval scale (based on Thurstone’s ‘law of comparative judgement‘). Finally, if used for assessment rather than only for ranking, grade boundaries are superimposed on the rank order.”

Research Examples

You can see my department and classes taking part in the research in 2005 BSM (Before Social Media), blogged on Teacher Toolkit 3 years ago in 2014. Ark Academies have also been instrumental in upgrading the status of CJ and their video is a useful one to see it in action:

The Eye Of The Beholder

CJ, also referred to as Adaptive Comparative Judgement (ACJ) or Assessment by pairwise ranking (APR), forms part of the ‘no more marking’ movement which is also the name of software specialists, No More Marking where Daisy Christodoulou is head of CJ. No More Marking say that we should stop marking our assessments and start judging them; “the underlying principle of CJ is simple, that we are better able to make comparisons between objects than we are at making holistic judgements.”

A No More Marking study of over 1,600 teachers from 199 schools judged the writing portfolios of over 8,500 Year 6 pupils and found that teachers showed a high degree of consistency in their marking: the reliability of the judgements was in excess of 0.84 out of 1.0.

The inter-rater reliability of the No More Marking method is a remarkable 0.9.

High reliability of relative CJs compared to absolute judgements can be explained because relative judgements involve more judges/teachers. In traditional marking scenarios one or two markers are used whereas CJ requires more than two.

Christodoulou notes, “Instead of asking teachers to make absolute judgements against unhelpful rubrics, CJ requires teachers to, well, make comparative judgements instead… One of the advantages of CJ is that it allows teachers to make judgements that ‘work with the grain of the mind, not against it’.” Take the Colours Test to find out more why CJ works and marking does not.

But a note of caution: Tine van Daal et al (2017) say,

Differences between judges in discriminating ability should be taken into account in the setup of CJ assessments and in the development of algorithms to distribute pairs of representations.

Despite this, new technology has the power to transform the way we think about assessment and as Pinna Tarricone and Paul Newhouse (2016) say, “comparative judgement delivered by online technologies is a viable, valid and highly reliable alternative to traditional analytical marking.”

Beyond Comparison

Led by Prof. Richard Kimbell of London University’s Goldsmiths College at the Technology Research Education Unit, the first application of CJ to the direct assessment of students was in a 2005 project called e-scape.

Pollitt (2012b) refers to his work with Prof. Kimbell on this study of the assessment of e-portfolios as an example of the high reliability coefficients that can be achieved with digital assessment and the CJ method. Pollitt suggested that in Kimbell’s study the high reliability coefficient of 0.96 generated by 28 judges (I was one of them) assessing 352 e-portfolios with 3067 judgements was higher than any analytical marking system could achieve.

The Goldsmiths College researchers were concerned that the current methods of assessment in Design & Technology (DT) rewarded a narrow set of approaches, and wanted to explore the ways in which e-portfolios could be used to capture students’ work, and combine this with a fairer, more consensual method of assessment.

They devised an approach that enabled students to draw up their initial design ideas on a personal digital assistant (PDA), record the progress of their design and then take photographs of their finished work, before uploading their projects to a central website where they could be assessed by moderators. In phase one of the project, funded by the DfES (as it then was) and the Qualifications and Curriculum Authority (QCA), Goldsmiths developed a proof of concept.

Ross McGill was a teacher involved in the research when Head of DT and ICT at Alexandra Park School in North London. He noted that the most innovative part of the e-scape project was the method Goldsmiths had devised for assessing the portfolios which, was “changing the face of how teachers mark”.

Ahead of their time …

Each assessor saw two example portfolios on their screen and made a judgement about which one was better: “So one will be selected online, and then the software will randomly select another project, and eventually you’ll have ranked the projects from top to bottom, and another person somewhere else in the country will look at the same sample, again completely randomly.”

Once all the assessors had ranked the projects, an overall ranking of the projects emerged. In the pilot, each e-portfolio was judged at least 17 times by seven different judges, which produced a highly reliable set of results. The Goldsmiths team were ‘ahead of their time’ in predicting that CJ could be extended to other subjects as comparative assessment has spread far and wide. As Dan Sandhu notes,

“ACJ has demonstrated huge potential in use by awarding bodies and institutions worldwide. So far, pilots delivering significantly improved reliability have taken place in Australia, Sweden, Singapore and the US. The process means GCSE and A-Level appeal figures could be dramatically reduced, after all if your grade is generated via a collective consensus of expert assessors through ACJ, what is there to appeal against?”

CJ has been introduced to assess competences, such as mathematical understanding (Jones et al., 2015), geography (Whitehouse and Pollitt, 2012), design and technology (Seery et al., 2012), and writing (Pollitt, 2012avan Daal et al., 2016).

And finally….

It’s clear that we are going to be hearing much more about CJ in the next couple of years and if you haven’t already realised and as Tarricone and Newhouse conclude,

The time is now to embrace digital technologies for high-stakes assessment and consider comparative judgement for scoring as a viable and reliable alternative to analytical marking, particularly for creative performances that rely on highly subjective judgements.

So, more noise? Yes please …

Other blogs and research worth a look

The majority of this article was written and researched by John Dabell and has been edited in parts by @TeacherToolkit.

One thought on “Comparative Judgement: You Be The Judge

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.