Thoughts on Comparative Judgement


One of the most interesting topics of discussion at our recent annual national AAIA conference, following a presentation by Dr Chris Wheadon of the ‘No More Marking’ organisation, was the subject of Comparative Judgement (CJ for short). Opinions of AAIA members vary considerably on this approach to assessment.


I’d like to use this blog to share my thoughts and continue the debate that we started in Liverpool. AAIA members can comment on this post to express their own views (or indeed start their own post).


My opinion, which I explain below, is that there are some assessment purposes for which CJ could be used and some for which it definitely can’t – and we need to be clear about which is which.


Firstly, what is CJ?


It is a system based on the principle that, when faced with just 2 examples of work (typically writing) most people (whoever they may be – this is not limited to qualified teachers) can say which of the 2 pieces is better – Piece A or Piece B. By presenting users of the system with a series of pairs of scripts and requiring them in each case to indicate which script is better, multiple judgements can be made very quickly.  A computer algorithm can then assimilate all this data, generated from all the users of the system, and produce a percentile ranking of each of the pieces.


It has been argued by some that this system could be the way forward for national assessment of writing and that it would be both more reliable and less time-consuming than the current process for teacher assessment of writing at the ends of Key Stages 1 and 2. For example, see this piece in the TES.


I have some concerns about its being used in this way, which I will come onto shortly.


But I will start with a separate concern, which is the suggestion that CJ can be used to support formative assessment. You can hear this suggestion being made by the STA’s policy adviser, 33 minutes and 30 seconds into this KS1 webinar video, where he says that CJ “has a lot to offer in terms of schools’ formative assessment”.


Let’s be clear – CJ does not support formative assessment.


Formative assessment depends upon teachers and pupils understanding where they are being most successful (in their writing, or any other area of learning), where they need to improve, and how best to make those next improvements. This has absolutely nothing to do with knowing what your percentile rank is. Formative assessment in writing requires that the writers (and the teachers) are clear about what is expected of them. When thinking about a particular aspect of writing, e.g. a specific text type, we need to understand the key ingredients that should be included. Phrases like ‘What A Good One Looks Like’ (WAGOLL) or ‘What Excellence Looks Like’ (WELL) are often used as a classroom device for helping students to identify the key features of exemplary models and, from those, developing an understanding of the success criteria. If we want to help our pupils make excellent progress, we (teachers) need to have the necessary subject knowledge and we need to share this with the pupils. We need to understand ‘what a good one looks like’ and we need to understand the requirements and expected standards of the National Curriculum.  Advocates of CJ point out that anyone can use it – you don’t need any ‘special knowledge’ to be able to make the judgements.  That may sound appealing, but is precisely why it cannot support formative assessment.  Knowing a child’s percentile rank cannot help me to know what they need to do to improve.


Another supposed selling point of the system is that it reduces teacher workload because it is so quick to administer. Even the very name of the organisation behind CJ, “No More Marking”, latches on to the bandwagon of trying to reduce teacher workload. Of course we all want to find ways of reducing teacher workload wherever possible, but let’s not kid ourselves that a simple algorithmic approach to ranking children’s writing has anything to do with helping teachers understand how to help the children improve.


What CJ could do for your school is provide an accountability mechanism of sorts. The research evidence suggests that CJ has a high level of internal reliability, in terms of accurately ranking the children’s work, due to the clever behind-the-scenes algorithm that processes all the judgements. If what you want, as a school leader, is a means to determine how your pupils’ standards compare with others’ across the country, this does that. And, I agree, it does it with minimal teacher workload.


However, let’s not forget that any norm-referencing system, such as a percentile rank, must by definition have some children above the ‘average’ and others below. However good everyone’s writing is, 49% will always be below the median.


And this is why I would be extremely concerned if CJ were ever to be adopted as the national mechanism for school accountability of writing.


Call me an idealist, but for me one of the key features of a school accountability system is that in theory everyone should be able to succeed. If you have a test with a particular pass-mark, it is hypothetically possible that every child could pass that test. If you have a system of ‘pupil can’ statements for teacher assessment, it is hypothetically possible that every child could reach that standard. It is not the same with a percentile rank-based system, such as CJ. There will always be winners and losers.


This is the key difference between a criterion-referenced system and a norm-referenced system.  Personally, for national accountability, I would always favour criterion-referencing, for the reasons given above. (Whether or not you feel the ITAFs that were in place over the last two years, or the new TAFs for 2018, were/are the right criterion-referenced system is a separate point.)


The difference between a norm-referenced system and a criterion-referenced one was well illustrated when No More Marking produced this publication, which claimed to demonstrate the threshold standard for Greater Depth writing at Key Stage 2 – despite the fact that the example published clearly did not meet the Greater Depth criteria. (This blog set out NMM’s justification for making this claim –  but in terms of causing confusion among teachers trying to finalise their statutory teacher assessments, the misinformation was, to say the least, unhelpful.)


My opinion, for what it’s worth, is that when judging the quality of writing, before we start trying to say whether Piece A is better or worse than Piece B, we should at least agree what it is we are looking for.  Are we concerned mostly with a writer’s ability to express meaning clearly, or structure a piece logically, or use a wide vocabulary, or spell with great accuracy, or use beautiful cursive handwriting, or… ?  The list goes on.


In other words, we need some agreed criteria for the required standards. And once those criteria are agreed, what helps us to make more reliable judgements and improve our subject knowledge is professional dialogue (i.e. moderation). Technology may also have a part to play in the process, but I don’t yet feel ready to hand over professional judgement about children’s learning to an algorithm.

1 Comment

  1. Interesting points Ben, and I agree entirely with the view that CJ was never supposed to be a tool for formative assessment purposes. If the profession gets muddled about that it will be doomed to failure anyway. I just hope the real purpose of this form of judgement making is remembered and writ large by all in positions of influence. However, as you know I am interested in this way forward as a replacement for current moderation purposes in coming to summative judgements. The current system is flawed, not very reliable , hugely expensive and distorts the curriculum.
    I wanted to add a few thoughts of my own.

    If you accept the definition of the purpose of summative assessment to be “an attempt to establish a shared understanding” of the quality of that which is judged, then I think CJ has a relevant and valid place in the national accountability agenda for teaching and learning in writing, alongside the tests for other subjects. I agree with Michael Tidd about that.

    CJ’s purpose is to rank schools according to the average ranking of all their scripts. How is that ultimately different from what the tests in reading and maths do for accountability and national comparative purposes at the end of Key Stages other than the actual method used ? Tests take the outcomes from individuals , puts them all together as cohorts in schools and ranks them surely?

    The CJ method is different but it employs the PROFESSIONAL opinions of practising teachers to make the judgement about which script is better. It calls upon their common understanding of quality of writing based on their collective experiences and degree of understanding of what should be taught and learned by Y6. It occurs to me that we DO rely upon and accept the professional judgements of others in their domains in other contexts and are happy to do so For example: Paramedics arriving at the scene of a disaster conduct a process of Triage in which, based on their professional knowledge and experience to date of similar injuries, they quickly compare the plight of victims, and prioritise the order in which people are treated. Why can we not accept that teachers, however long they have been in the profession, can exercise professional, holistic judgement in this way, without recourse to lengthy and over complicated rubrics that promote more disagreement than agreement? The nature of their job demands they have this basic understanding.

    You say you are unhappy with the computer applying an algorithm to your judgements. Does it really do that ? Surely the computer is programmed simply to put each of the multitude of repeated judgements into an order based on what the masses judge collectively. It just does what it is asked to do, rather like a calculator does what is asked when presented with two numbers to add together or whatever, however complicated the function is.

    I agree wholeheartedly with the principle of not allowing computer algorithms to dominate our assessments, but sadly (and wrongly) many schools across the country are quite happy to use a range of very expensive commercial systems for recording and tracking progress in all subjects that do exactly that! They input hundreds of formative assessments made daily by hard working teachers, into an all singing and dancing computer package that applies an algorithm and comes out with a summative judgement , often very finely graded along with a #tag, which is neither valid, reliable or useful! This is then used to articulate and exemplify progress!

    Let’s listen to those across the country who are giving CJ a go and learn from them. I think its worth a try.

Leave a Reply