One of the most interesting topics of discussion at our recent annual national AAIA conference, following a presentation by Dr Chris Wheadon of the ‘No More Marking’ organisation, was the subject of Comparative Judgement (CJ for short). Opinions of AAIA members vary considerably on this approach to assessment.
I’d like to use this blog to share my thoughts and continue the debate that we started in Liverpool. AAIA members can comment on this post to express their own views (or indeed start their own post).
My opinion, which I explain below, is that there are some assessment purposes for which CJ could be used and some for which it definitely can’t – and we need to be clear about which is which.
Firstly, what is CJ?
It is a system based on the principle that, when faced with just 2 examples of work (typically writing) most people (whoever they may be – this is not limited to qualified teachers) can say which of the 2 pieces is better – Piece A or Piece B. By presenting users of the system with a series of pairs of scripts and requiring them in each case to indicate which script is better, multiple judgements can be made very quickly. A computer algorithm can then assimilate all this data, generated from all the users of the system, and produce a percentile ranking of each of the pieces.
It has been argued by some that this system could be the way forward for national assessment of writing and that it would be both more reliable and less time-consuming than the current process for teacher assessment of writing at the ends of Key Stages 1 and 2. For example, see this piece in the TES.
I have some concerns about its being used in this way, which I will come onto shortly.
But I will start with a separate concern, which is the suggestion that CJ can be used to support formative assessment. You can hear this suggestion being made by the STA’s policy adviser, 33 minutes and 30 seconds into this KS1 webinar video, where he says that CJ “has a lot to offer in terms of schools’ formative assessment”.
Let’s be clear – CJ does not support formative assessment.
Formative assessment depends upon teachers and pupils understanding where they are being most successful (in their writing, or any other area of learning), where they need to improve, and how best to make those next improvements. This has absolutely nothing to do with knowing what your percentile rank is. Formative assessment in writing requires that the writers (and the teachers) are clear about what is expected of them. When thinking about a particular aspect of writing, e.g. a specific text type, we need to understand the key ingredients that should be included. Phrases like ‘What A Good One Looks Like’ (WAGOLL) or ‘What Excellence Looks Like’ (WELL) are often used as a classroom device for helping students to identify the key features of exemplary models and, from those, developing an understanding of the success criteria. If we want to help our pupils make excellent progress, we (teachers) need to have the necessary subject knowledge and we need to share this with the pupils. We need to understand ‘what a good one looks like’ and we need to understand the requirements and expected standards of the National Curriculum. Advocates of CJ point out that anyone can use it – you don’t need any ‘special knowledge’ to be able to make the judgements. That may sound appealing, but is precisely why it cannot support formative assessment. Knowing a child’s percentile rank cannot help me to know what they need to do to improve.
Another supposed selling point of the system is that it reduces teacher workload because it is so quick to administer. Even the very name of the organisation behind CJ, “No More Marking”, latches on to the bandwagon of trying to reduce teacher workload. Of course we all want to find ways of reducing teacher workload wherever possible, but let’s not kid ourselves that a simple algorithmic approach to ranking children’s writing has anything to do with helping teachers understand how to help the children improve.
What CJ could do for your school is provide an accountability mechanism of sorts. The research evidence suggests that CJ has a high level of internal reliability, in terms of accurately ranking the children’s work, due to the clever behind-the-scenes algorithm that processes all the judgements. If what you want, as a school leader, is a means to determine how your pupils’ standards compare with others’ across the country, this does that. And, I agree, it does it with minimal teacher workload.
However, let’s not forget that any norm-referencing system, such as a percentile rank, must by definition have some children above the ‘average’ and others below. However good everyone’s writing is, 49% will always be below the median.
And this is why I would be extremely concerned if CJ were ever to be adopted as the national mechanism for school accountability of writing.
Call me an idealist, but for me one of the key features of a school accountability system is that in theory everyone should be able to succeed. If you have a test with a particular pass-mark, it is hypothetically possible that every child could pass that test. If you have a system of ‘pupil can’ statements for teacher assessment, it is hypothetically possible that every child could reach that standard. It is not the same with a percentile rank-based system, such as CJ. There will always be winners and losers.
This is the key difference between a criterion-referenced system and a norm-referenced system. Personally, for national accountability, I would always favour criterion-referencing, for the reasons given above. (Whether or not you feel the ITAFs that were in place over the last two years, or the new TAFs for 2018, were/are the right criterion-referenced system is a separate point.)
The difference between a norm-referenced system and a criterion-referenced one was well illustrated when No More Marking produced this publication, which claimed to demonstrate the threshold standard for Greater Depth writing at Key Stage 2 – despite the fact that the example published clearly did not meet the Greater Depth criteria. (This blog set out NMM’s justification for making this claim – but in terms of causing confusion among teachers trying to finalise their statutory teacher assessments, the misinformation was, to say the least, unhelpful.)
My opinion, for what it’s worth, is that when judging the quality of writing, before we start trying to say whether Piece A is better or worse than Piece B, we should at least agree what it is we are looking for. Are we concerned mostly with a writer’s ability to express meaning clearly, or structure a piece logically, or use a wide vocabulary, or spell with great accuracy, or use beautiful cursive handwriting, or… ? The list goes on.
In other words, we need some agreed criteria for the required standards. And once those criteria are agreed, what helps us to make more reliable judgements and improve our subject knowledge is professional dialogue (i.e. moderation). Technology may also have a part to play in the process, but I don’t yet feel ready to hand over professional judgement about children’s learning to an algorithm.