Summarizing the research for the effectiveness of different instructional modalities

By Simon Bates posted on April 3, 2015

The apparently simple question of ‘Which is better: face-to-face, blended or online?’ turns out to be much more complex than you might think. Perhaps it is not even the right question to ask? Meta-analyses of published work1,2 often present a very mixed picture of the effectiveness of different delivery modes for courses. Despite careful research and analysis, and in some cases meta-analyses of large quantities of published work, findings are far from definitive.

So what are the reasons for such a lack of clarity in an area so widely studied?

There are three broad areas that can be identified as contributing: methodological shortcomings, the nature of and timescale over which measurement of improvement takes place, and broad categorizations of widely differing instructional practices.

1. Methodological considerations

Research reports summarizing instructional impact within a single course setting necessarily present difficulties in knowing how far (if at all) their findings may be generalized to other courses, institutions, populations of students, etc. But furthermore, there are methodological shortcomings in the way the many studies are designed or conducted that raise questions over the validity of findings. Relatively few studies employ design methodologies that permit a causal inference; many don’t provide a rich enough description of the context to determine key conditions necessary to recreate the results; others do not utilize sufficient controls to account for known or observed differences between groups; some have small sample sizes, compounded by then splitting the cohort into different delivery modalities; and many neglect potential biasing factors such as differential attrition rates in different course delivery modalities.

Meta-analyses that summarize many individual studies often find that only a minority of studies meets stringent inclusion criteria (which may for example include a requirement for randomization or quasi-experimental design, direct comparison of sections delivered in different modalities, and assessment of objective learning outcomes or measurement of performance that is not self-reported, etc.). These are careful and comprehensive meta-analyses that still present a mixed picture. Some find significantly better learning outcomes in online and blended sections, some find effectively no difference and some significantly worse outcomes.

2. Nature and timing of measurement of improvement

The vast majority of studies report findings measured across a single semester-long course or, less frequently, across a whole year of study. Measurements of effectiveness are thus limited to ‘what is achieved in a small number of weeks/months’. Contrast this with the broad goal of any form of instruction as being ‘far transfer’, the application of knowledge and skills far removed (in time and in context) from the point of instruction. Compounding this can be the nature of the measurement used to provide evidence of improvement. Frequently, performance on a final exam is used as the measure of achievement, and it is not always easy to see the alignment between long term learning outcomes for a course (‘what do you want students to be able to know/do/appreciate six months after the end of the course?’) with end-of-course assessments.

3. Categorization of instructional practice

Typically, courses evaluated in these studies are categorized as ‘face-to-face’, ‘blended/hybrid’ and ‘online’. These three categories are both overlapping and impossibly broad. When does face-to-face become blended? What counts as blended? When does blended become online? Do different institutions have different understanding of the same terms? Even within a categorization, there is room for such a wide variety of learning designs and sequencing of activities that courses in the same category look far more different than they do similar.

Consider the following two examples (neither of which are hypothetical!). Is it meaningful to compare student outcomes from a face-to-face section to one delivered online where the delivery period for the online course was dramatically compressed, potentially meaning that the opportunities for feedback and interaction were similarly compressed or reduced compared to the face-to-face course? Likewise, is a course really ‘blended’ if only two weeks of a course was replaced by blended content and activities, with the remainder following a traditional content-based structure?

This artificial grouping of essentially different courses into the same categorical space certainly contributes to the lack of clarity that emerges from the careful analysis of the research. Such broad categorization will hide course design or delivery elements that may have implications for learning far larger than the ‘effects’ being measured. Context, in the broadest sense, matters enormously.

So where does this leave us? The methodological issues outlined above are, at least in principle, fixable through careful research design and clear contextual descriptions that allow others to attempt to recreate the same findings. The issue of timing is tricky: on the one hand the case for longitudinal measures is clear, but such an approach is itself problematic because of opportunities to relearn material/different re-exposure to concepts and topics, and attrition in general. But it is the categorization of instructional practice into three too-broad and imprecise buckets that is the kicker here. It may well mean that the question of ‘Which is better: face-to-face, blended or online?’ is not just unanswered but unanswerable when asked in that form.

None of this means we should retreat from an evidence-based approach when thinking about how to improve student learning. What is needed though is a change of tack, to focus on what we know has been proven (through extensive research and field-testing) to support student learning. To do this requires a framework or a model that bridges different disciplines, and is not dependent on the delivery modality of the courses into which such activities are embedded. A number of such frameworks exist (for example ‘How Learning Works’3) which will allow us to re-place the focus on the pedagogical practice, rather than the delivery modality. A more specific or granular approach would be to consider which instructional activities have been shown to produce the most significant learning gains with students. Although focusing on the K-12 sector, John Hattie’s work to catalogue such effect sizes of different instructional interventions4, derived from thousands of studies, has much to offer in terms of what practices and approaches have been found to be effective. Choose these, understand why and how they support learning, and then let’s concern ourselves with how the chosen, or perhaps required, mode of delivery then can be best leveraged to support such activities.

1. U.S.Department of Education, Office of Planning, Evaluation, and Policy Development. (2010). Evaluation of evidence-based practices in online learning: A meta-analysis and review of online learning studies. Washington, DC. Retrieved from

2. Wu, D. D. (2015). Online learning in post-secondary education: A review of the empirical literature (2013-14). Retrieved from

3. Ambrose, S. A., Bridges, M. W., DiPietro, M., Lovett, M. C., & Norman, M. K. (2010). How learning works: Seven research-based principles for smart teaching. San Francisco: Jossey-Bass. See a summary of the seven principles at

4. Hattie, J. (2008). Visible learning: A synthesis of over 8000 meta-analyses relating to achievement. Routledge.

Simon Bates is senior advisor, teaching and learning, and academic director of the Centre for Teaching, Learning and Technology. He is also a professor of teaching in the Department of Physics and Astronomy.