In Search Of Research: Mathematical Intimidation

Reading time: 6
Data People Bars


Ross Morrison McGill founded @TeacherToolkit in 2010, and today, he is one of the 'most followed educators'on social media in the world. In 2015, he was nominated as one of the '500 Most Influential People in Britain' by The Sunday Times as a result of...
Read more about @TeacherToolkit

Does performance appraisal lead to teacher improvement?

For purposes of accountability, teacher effects are grossly exaggerated. Because it is believed that teachers matter so much, it makes sense to hold them accountable for everything that is wrong in education. (Van Der Wateren and Amerin-Beardsley, Flip The System)

If we all want to be part of a dialogue in which we can move away from measuring teachers and more towards improving them, we need to tackle this concept of ‘mathematical intimidation’. A term coined by John Ewing (2011) in his critical analysis of VAMs (Value Added Measures).

“If teachers are to be held accountable for the progress of their students, it must be decided how much progress is enough for the given time interval. Since the targets are set by policymakers are hardly backed by research evidence, they come down to little more than pure guesswork (Koretz, 2008a).”

The Appraisal Fallacy

Firstly, a couple of disclaimers. Performance appraisal can lead to some specific improvement. For example:

  • headteachers can save money; failure to ‘sustain’ targets can retract threshold salaries
  • an organisation can ensure the entire workforce has the same goal
  • teachers will jump through hoops.

I cannot think of one period in my teaching career in which performance targets improved me as a classroom teacher. However, I am been slightly unfair to all those people who have line managed me over the years. Appraisal has taught me how to focus on particular areas of my practice – particularly in leadership – but not so much in terms of my teaching repertoire.

As a result, we have ‘hoop-jumping teachers’ en-masse.

Quicker, cheaper and better is the default mode in our education system today. How familiar are you with the following line-manager statements:

  • We need [these students] to reach level / grade by [this date].
  • Once you determine the details, this is followed with …
  • However, your are aware of external pressures; your budget will be less next year.
  • Soon followed by …
  • Therefore, based on the national average point scores, you APS target is [enter number] for this class/group.

Is this motivating you to improve or merely mathematical intimidation?

For the past 10 years I’ve led whole school appraisal, and although I have enjoyed setting up various systems, at its heart, I know that the process can be unreliable and have observed anecdotal decisions between performance management and pay-related decisions. I know this will be the case in some other schools.

In Search of Research

On Google Scholar – shoot me down now – I used the following term, “teacher performance appraisal England”. The result was 97,800 entries. When I then added “leading to teacher improvement”, the results dropped to 76,900 entries. I got a little excited. Understanding a bit about of how SEO and search algorithms work, I then proceeded to refine my search for research.

I then added in ‘quotations’ to make the search more targeted.

  • Performance ‘teacher appraisal leading to teacher improvement’ England = 79,000 results.
  • When adding in the date stamp 2010 to 2017 (Gove policy) = just 18,600 results.

This got me thinking more about what it is I actually wanted to ‘test’.

Measuring What Doesn’t Count

Every Government wants an education budget that is well spent. This is entirely the same for individual schools.

“With Value-Added Measures (VAMs) offers schools some form of accountability, whether they are fair is another issue. At present we have a system where we expect teachers to achieve good test scores, complete report cards and If they underperform, bad teachers all fired and bad schools are closed. Yet, it has been known for a long time that these war numbers do not give a reliable picture of the quality of the school or teacher; there are other factors influencing student achievement. For example, socio-economic status.”

The illusion of control does more damage than good, yet few teachers really ask how reliable these measures are. It starts with the words ‘standard’.

“There is nothing special about standardisation. We use it all the time in our daily lives … The problem is not so much how students are tested – that is usually well-founded … The question is what is being tested, or rather, should be tested, and for what purpose … Assessment has flooded every aspect of school life to a point where teachers can also be selectively assessed e.g. salary rise, bonus, job termination, or school closure, based on students’ outcomes.”

We know children’s learning is not linear process, yet we continue to promote this nonsensical concept that we can judge the quality of a teacher by the outcomes a pupil secures.”

Value-Added Measures

Schools, Multi-Academy Trusts and local authorities have little idea whether their value-added schools have been calculated correctly … accepting that the termination of their jobs or the closure of the schools is a fair decision based on solid analyses.

“Abundant research evidence (ASA, 2014) indicates only about 1 to 14% of educational outcome can be attributed to schools.”

For example, teacher effect, but there are still many other factors, such as class sizes, resources and school budgets can influence a teacher’s impact. “The remaining 86 to 99% out-of-school factors are outside the control of teachers and schools.” (Coleman et al, 1966)

“Every teacher knows that it is far easier to teach homogeneous classes with students from wealthy families … than to teach classes from many cultural backgrounds … It is reasonable to assume that the teachers in both cases are, on average, equally effective – although secretly believe that any teacher who survives many years under the most unfavourable conditions probably is a better teacher than those who can do the job in almost leisurely circumstances.”

For years, we have been declaring that the quality of teaching is the greatest influence on student outcomes. This may well be true, but the context in which you work clearly has a huge impact on your capability and your success.

Continuous Improvement

Gary Rubinstein (2012) “tested the assumption of reliability and consistency using VAM … starting with three assumptions:

  1. A teacher’s quality does not change by a huge amount in one year
  2. Teachers generally improve each year
  3. A teacher in their second year is way better than ‘that teacher’ was in the first year.”

Of 707 teachers sampled in 2008-2009 who were in the first year of teaching, many teachers scored high in their first year and lower in the second year – with many who scored low, improving. Rubinstein concluded:

“… just 52% of teachers were better in their second year … Therefore, despite all the claims by our politicians, VAMs are unfit for formative assessment of teachers and the quality of teaching.”

“In summary, VAMs are:

  1. Unreliable – a teacher classified as adding value in one year has 25 to 50% chance of being classified as subtracting value in their next year.
  2. Invalid – there is limited evidence that teachers with high value-added scores are effective
  3. Biased – teachers of students who are not randomly assigned, have more difficulties demonstrating growth
  4. Unfair – only teaching with pre-and post test-data at certain levels are being held accountable
  5. Fraught with measurement errors – variables that cannot be controlled
  6. Inappropriate performative use –  teachers do not understand the models used to evaluate them
  7. Used appropriately to make consequential decisions – capability, pay
  8. Unintended consequences are going unrecognised – teachers choosing not to teach students will most likely to hinder growth.

This week, the Department for Education published ‘Evaluation of Teachers’ Pay Reform Research‘ October 2017. The conclusion stated:

The introduction of pay reforms appears to have gone smoothly, although many teachers report that the process of gathering and reviewing evidence has added to their workload.”

If we want to keep with appraisal, we should initiate a move away from measuring to developing. An example could be for a teacher: ‘why do year 12 Bangladeshi students drop out of A-Level politics after year 12?’ (You can find this research in our action research booklet.)

Instead of searching for the answers from places like Singapore, South Korea and Finland, we only need to look north of the border to our colleagues in Scotland. Their comprehensive curriculum – with strong and fair evaluation procedures – provides actual autonomy for schools and teachers.

It was only last weekend I heard Lord Jim Knight say these exact words at Education Forward.

There’s a whole bunch of things that politicians interfere with and they don’t have the expertise, nor should we allow them to be political footballs. But, we should have politicians who fund schools properly, who equipped them properly and staff them properly. And if politicians just worried about those three things and left everything else to the profession – and trusted the profession – we’d have a better education system.”

Better Together

If only we could believe, that you and I together could congregate to take back control of the profession. Together we could use the evidence cited here to re-engineer what we believe is right for our children; what the curriculum and assessment foundations of our schools should look like and what should be tested – including accountability of schools and of teacher.

Value Added Measures give a false perception of teacher and student performance. If we stopped relying on them, policy makers may feel that they have lost control, but “what would actually happen is the burden and cost would be reduced if moved to a local level. Instead, unreliable methods of accountability are driving away our most precious commodity – our teachers.” If [we] schools could stop this binary methodology for measuring teachers’ performance and redesign curriculum, assessment, teaching and learning and appraisal from the ground up, we wouldn’t have teachers voting with their feet and leaving the profession in their thousands.

To mark the beginning of an end to this nonsense, let’s stop this mathematical intimidation and go in search of the research. We can’t do it alone – we need our politicians – but we also need you.


7 thoughts on “In Search Of Research: Mathematical Intimidation

  1. Pingback: Dear Santa
  2. I need some help…

    This article places the following in quotation format:

    “Abundant research evidence (ASA, 2014) indicates only about 1 to 14% offer educational outcome can be attributed to schools.”

    But when I google the same in quotations, all I get is this URL. I’m looking for some more info on the “1–14%” claim, as the reference here is very vague. Could you please point me in the right direction?

    Further, a question: Am I understanding the claim correctly, that it means the difference between having a good teacher and having a bad teacher will only have a 1–14% impact on the student’s performance?

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.