Text Mining: An Evaluation of 17,000 Ofsted Reports

Reading time: 3


Ross Morrison McGill founded @TeacherToolkit in 2010, and today, he is one of the 'most followed educators'on social media in the world. In 2015, he was nominated as one of the '500 Most Influential People in Britain' by The Sunday Times as a result of...
Read more about @TeacherToolkit

What if the general public could understand school inspection trends?

As my doctoral research progresses, I’m keen to unpick how teacher voice may or may not influence education policy. The ability to scrape large amounts of data enables us to understand how this may happen…

The tone of inspection reports…

17,000 Ofsted Text MiningIn a paper published in September 2020, Demonstrating the potential of text mining for analyzing school inspection reports: a sentiment analysis of 17,000 Ofsted documents (by Christian Bokhove and Sam Sims) unpick how inspection judgements produced by inspectors “play a part in the way that schools are held to account and constitute an important source of data in their own right”.

Specifically, the authors report the results of sentiment analysis, looking through inspection reports from 2000 up to the present day, “comparing the tone of inspection reports across the different grades awarded in each inspection and across different Chief Inspectors.”

I too have been looking at text mining, conducting several social media network analyses to determine how people are using Twitter.

In September 2021, I took a closer look at Amanda Spielman’s network when she said that “in a lot of schools, it felt as though their attention went very rapidly to [helping] the most disadvantaged children, making food parcels, going out visiting… but in some cases [it was prioritised over teaching].” Take a look at my example of text mining as a result of this conversation…

Definition and Aims

The authors define text mining as “the automated processing and analysis of unstructured strings of characters provides a potentially valuable alternative approach.” The paper has two aims:

  1. To analyse a very large corpus of school inspection reports using text mining.
  2. To understand the changing nature of Ofsted inspections over time and the changing nature of school inspection in England, particularly the long-standing debate surrounding consistency.

The table below shows the number of documents grouped by Chief Inspector. It should be noted that, when addressing objective two above, the researchers do not analyse the periods during which Ofsted was being run by an acting Chief Inspector. One interesting comment in the paper is something I had in my subconscious; it was useful to be reminded that “a school inspectorate of some sort [has existed] since 1839.”

17,000 Ofsted Text Mining

Positive inspection grades vs. sentiment language used…

The table below shows boxplots of the distribution of the sentiment score by inspection grade with a positive association between inspection grade and sentiment. Interestingly, “Satisfactory (the old name for Requires Improvement) is slightly higher than Good” which highlights why Michael Wilshaw got rid of this label as it “conveyed too positive a message about the performance of schools in the third lowest (of four) inspection grades (Ofsted2012).”

17,000 Ofsted Text Mining

Disadvantaged vs. Inspection reports

The research paper highlights something of great importance which will reassure many school leaders. “Schools in disadvantaged areas might be judged more harshly by Ofsted; the term ‘disadvantaged’ makes a larger negative contribution (as a proportion of overall sentiment) to the average sentiment of Requires Improvement inspection reports than it does to Good inspection reports.”

The word “Progress” featured throughout all reports.

Demonstrating the potential of text mining for analyzing school inspection reports: a sentiment analysis of 17,000 Ofsted documents Christian Bokhove Sam Sims

Between 2007 to 2011, the sentiment analysis [language used] became more positive at this time. The authors write that “the decline in the influence of ‘progress’ under Spielman coincides with Ofsted’s subsequent decision to place less emphasis on progress data.”


The researchers set out “to investigate whether there was a relationship between the sentiment (or tone) of each report and the corresponding inspection grade awarded.” Limitations of the type of research are discussed (page 11) as well as sentiment analysis itself. For example, the word ‘disadvantaged’ usually “conveys negative sentiment–albeit often accompanied by a feeling of sympathy or a desire to help.”

In the context of Ofsted, “these methods could be employed to further study the influence of inspection grade, inspector, or other relevant variables on the content of inspection reports.” If Ofsted uses machine learning to predict inspections, it’s fabulous to see that we are now in a position to use similar technology to evaluate Ofsted’s reliability.

I believe this type of research is the first of its kind, and it is a brilliant and welcomed contribution to the school inspection and accountability space. Bravo!

Download the paper.

One thought on “Text Mining: An Evaluation of 17,000 Ofsted Reports

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.