Dr Hugh Morrison: Grades cannot be meaningfully ascribed in the absence of a test

In the absence of the usual public examinations, efforts were made to “calculate” the grade each pupil would have received had he or she taken the relevant examinations.

A troubling outcome of the ensuing “grades debacle” was the confidence invested in algorithms as devices capable of accurately answering counterfactual questions in respect of examination grades.

Senior staff at CCEA, SQA, Ofqual, WJEC and the IBO all appeared to endorse the pivotal role that algorithms could play in moderating teachers’ grade predictions.

Moreover, at the time of writing, the Republic of Ireland seem set on pressing on with “calculated grades” despite the front pages of just about every newspaper in the UK deriding this approach.

I will argue in the next few paragraphs that no algorithm can estimate a student’s grade because a grade only has meaning with respect to the test which generated it. However, there is no 2020 test for the algorithm to focus on. Indeed, the whole grades debate has raged, it would seem, without any mention of a test.

The case will be made below that without a GCSE geography test, for example, it is meaningless (for a teacher or an algorithm) to ascribe a GCSE grade to a student.

Let me try to illustrate this rather startling conclusion using Malcolm Gladwell’s account of the “Flynn effect,” published in The New Yorker of December 12, 2007. James Flynn discovered that I.Q. varied greatly according to which version of the Wechsler intelligence test (WISC) one takes.

This has dramatic consequences in that statements such has “Alice has an I.Q. of 120” lose their meaning because no mention is made of the test Alice took.

Flynn demonstrated that I.Q. will change markedly if the I.Q. test is changed.

To quote Gladwell (p. 95): “The notion that anyone “has” an I.Q. of a certain number, then, is meaningless unless you know which WISC he took.”

I.Q. is no longer seen an intrinsic property of the individual; rather, it is a property of the individual’s interaction with a test. Intelligence is manifest in the interaction between individual and test.

Jerome Kagan is one of the world’s foremost authorities on the nature of personality. Kagan’s (1998, p. 16) reasoning parallels Gladwell’s: “Most investigators who study ‘anxiety’ or ‘fear’ use answers on a standard questionnaire, or responses to an interviewer, to decide which of their subjects are anxious or fearful. A smaller number of scientists asks close friends or relatives of each subject to evaluate how anxious that person is. A still smaller group measures the heart rate, blood pressure, galvanic skin response, or salivary cortisol level of their subjects.

Unfortunately, these three sources of information rarely agree.” Kagan is making the point that in order to communicate unambiguously (the hallmark of science) about measured fear/anxiety one must make mention of the measuring instrument.

The predicates ‘anxiety’ and ‘fear’ only have definite properties relative to a specified measuring tool. Once again, these predicates are properties of interactions and not intrinsic properties of the individual to whom they are ascribed. One cannot attribute a definite value to a psychological attribute construed as a property of a person; this attribute only has definite properties relative to a particular instrument.

Kagan (1998, p. 77) cautions: “Modern physicists appreciate that light can behave as a wave or a particle depending on the method of measurement … psychologists write as if that maxim did not apply to consciousness, intelligence or fear.”

Dr. Mike Cresswell, former Director General of the AQA Awarding Body has put forward similar arguments in respect of grades. Teachers’ and the Awarding Bodies’ use of grades betray fundamental misconceptions which are entirely at odds with the principles set out above. These misconceptions probably contributed to the “grades debacle”.

Schools, for example, when summarising the attainment of their students, typically report the percentage of pupils achieving each grade on the relevant grade scale. In doing this, they (unwittingly) treat the grade as an intrinsic property of the student, adding together all students who secured a grade C, for example, irrespective of the particular examination paper taken by the student.

The schools simply add the C grades in physics to the C grades in art, and so on, through the full range of subjects. The particular examinations (physics, art, …) are ignored. This use of addition is only justified if the grade is considered to be an intrinsic attribute of the student.

Cresswell (2000, p. 72) writes: “To illustrate the difficulty … take just a few examples: we require a Grade C in mathematics to represent comparable attainment to a grade C in Physics, a grade C in English, a Grade C in French and a Grade C in Art. This requirement implies some way of making direct quantitative comparisons of candidates’ attainments across disparate subjects.

This is impossible … .” Cresswell (1997, p. 75) notes: “[T]he examining boards have continued to claim, if only by implication, that the same grade represents the same standard of attainment in any subject.”

In short, there is a tendency to ignore the qualitative difference in the subjects and compute the “C grade total” by adding together all the students who have achieved that grade, irrespective of the examination subject, i.e. to arrive at a quantity by treating the grade as an attribute of the student.

If the grade had been (correctly) construed as a property of the student’s interaction with an examination paper (physics, art, and so on), the temptation to add goes away.

In summary, then, if CCEA, IBO, Ofqual, SQA and WJEC had treated educational predicates as entities ascribable to interactions rather than individual students, a fundamental problem with their strategy would have been clear at once.

Without a test, there can, of course, be no interactions; grades (whether they be the predictions of teachers or the output of an algorithm) cannot be meaningfully ascribed in the absence of a test. In the words of Asher Peres, “Unperformed experiments have no results.”

Dr Hugh Morrison , formerly of Queen’s University, Belfast, is a renowned physicist, educationalist and examinations expert.