A Systematic Review of (Classroom) Retrieval Practice

Reading time: 3


Ross Morrison McGill founded @TeacherToolkit in 2010, and today, he is one of the 'most followed educators'on social media in the world. In 2015, he was nominated as one of the '500 Most Influential People in Britain' by The Sunday Times as a result of...
Read more about @TeacherToolkit

Does retrieval practice improve student learning in school and classroom settings?

There is an abundance of evidence to suggest that retrieval practice has a large effect size on improving learning for a variety of education settings. However, what impact does it have in the classroom…

Retrieval PracticeOver the last decade, there has been an explosion of interest from teachers to learn more about retrieval practice!

In a new paper published by Agarwal et al, 2021, the term “classroom research” is used to conduct a narrow literature review on published retrieval practice research. Over 2,000 abstracts and 50 coded experiments were screened “to establish a clear picture of the benefits of retrieval practice in real-world educational settings.”

Where classroom relevant materials used by students during class, under the supervision of a teacher or researcher, only 37 studies met the full screening criteria.

This new paper looks exclusively at retrieval practice conducted in classroom settings. This is a welcomed addition to this field, as much of the past research most teachers have cited, myself included, has often been conducted in laboratory settings.

As a doctoral student, the introductory section of the paper offers a comprehensive picture of how academics identify their research (page 10), including how to use research databases to screen past published research. This may not be relevant for teachers in terms of practical application, but very useful for teachers who are interested in how research is gathered.


“Retrieval practice in the classroom takes many forms that differ from the typical notion of a test… These low-stakes or no-stakes classroom learning activities were seldom referred to as ‘tests’ by the authors of the studies.” Teachers are more likely to use ‘quizzes’ or teaching terms such as ‘Do now’ activities to identify where retrieval practice activities are consciously used. Software such as Kahoot, Quizlet and many others are often used to support this…

“The process of practising retrieval that shapes learning, not tests” is referred to in this paper. Retrieval is also defined as “every class, once a week, once a month” with K12, content, delay and feedback also identified in the aims.


From the 50 experiments coded in this review, the total sample size was n = 5,374 with education levels from 6 years of to college age. Only three (6 per cent) of studies included research conducted outside of WEIRD (Western, educated, industrialized, rich and democratic) countries. Those three countries were Taiwan, Turkey and Pakistan.

Sample sizes range from fewer than 20 to nearly 400 students with delays between retrieval practice and the final test from one day to the end of term. Quite a wide-ranging sample…

From my limited understanding of retrieval practice, most tend to focus on secondary and college students and maths and science subjects (35/50). As ever, I’m always curious to learn how retrieval practice works in primary school settings and in other subjects. Digging into the database results (page 14), only five (from 50) studies were conducted in primary schools. Fifteen in secondary classrooms and 30 in college environments.

The majority of past published results revealed medium to large effect sizes; something I will help explain in a future blog post. Whilst we know retrieval practice is a great strategy to strengthen memory, there is not much-published research to highlight how retrieval practice works in classroom environments…

Screenshot 2021 08 18 At 14.02.37

Credit: Agarwal et al, 2021.

Clearly, there are many roads to Mecca! Perhaps it’s time to say goodbye to Powerpoint templates in schools?

Frequency and Feedback

Retrieval practice was typically used “at least once per week., with few experiments providing retrieval “multiple times.” The most common delay between the last retrieval practice and the final test was 1-3 days, what’s the most common format being multiple choice (27) and short answer 917) formats.

Note, free recall, cued recall and fill-in-the-blanks were only used 6 times respectively in retrieval exercises, with multiple choice (31) being the most popular format for final testing.

On feedback, or that boogie word ‘marking’, 10 experiments did not include any feedback and only 4 included delayed feedback. Researchers acknowledge that they were unable “to establish an optimal timing of feedback in school classroom settings.”


The researchers offer eight recommendations. We need more retrieval practice research investigating:

  1. Varying delays between retrieval practice and the test
  2. The provision and timing of feedback
  3. Common classroom practices
  4. Class sizes
  5. Non-science content areas
  6. The role of the teacher-researcher as a modulating factor (Hawthorne effect)
  7. How and when collaborative and online quizzes increase learning
  8. More diverse student populations.

For teachers seriously interested in retrieval practice, this research should have you thinking harder about how we can improve the evidence base for retrieval in classroom settings. The researchers conclude, “that educators should implement retrieval practice, with less concern about the precise format or timing of retrieval interventions.”

We need more research on retrieval practice from school settings, particularly in primary classrooms and from other subject areas.

You can read a summary from Pooja Agarwal who co-published the paper.

5 thoughts on “A Systematic Review of (Classroom) Retrieval Practice

  1. One major problem with the literature review cited is the fundamental assumption that effect sizes are accurate indicators of the effectiveness of teaching methods. If that assumption holds then we might be off to a promising start, but it doesn’t hold, even though many researchers believe it does.

    An indicator of the effectiveness of a teaching method must only be influenced by the properties of the teaching method. Effect sizes values are influenced by properties of the experiment used to produce them. Consequently, a method used in an experiment that produced a ‘high’ effect size may not have been more effective than a different method used in a different experiment which produced a ‘medium’ or even ‘low’ effect size.

    Put simply, to claim that a teaching method used in an experiment that produced a comparably high effect size is more effective, or has a greater impact, than a teaching method used in another experiment that produced a comparatively lower effect size is plainly misguided.

      1. Teachers and leaders would surely benefit from some clear information about what effect sizes actually measure. Most importantly they could do with knowing that they don’t measure the effectiveness of education interventions.
        Although it is specifically a response to an editorial in a mathematics journal, the following is a useful article by Adrian Simpson at Durham University: Simpson, A., 2020. On the misinterpretation of effect size. Educational Studies in Mathematics, 103(1), pp.125-133.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.