I came across a couple of items talking about adaptivity, technology, and learning recently, and it got me thinking about this latest buzzword for learning improvement. It seems odd to question the idea that it is beneficial to adapt a learning environment to learners, yet a lot of things have to go right for any change in learning environments at scale to “work better.”
The SRI report makes for tough reading for folks who expect technologies to make easy progress in learning: mostly no concrete impact detected – at best, in a few courses “'slightly higher' average grades.”
So do we conclude “it” doesn't work?
Yikes! Imagine if we decided to test if “chemicals” work for health. We picked out a bunch of chemicals, a bunch of providers, a bunch of patients, and we ran pilots with them. There's a very good chance we'd discover that “chemicals” don't work for health. Hmmm. . .
Fortunately, neither SRI nor the Gates Foundation make this mistake. They are well aware of how many things have to go right to make a difference at scale, and this was simply a starting point to see how to run pilots in colleges “in the wild” with adaptive technologies – what goes wrong, what goes right.
So let's back up a step: what do we even mean by “adaptivity?” My colleague David Niemi working with another colleague Amelia Waters a few years ago created a taxonomy of types of adaptivity:
Adaptive instruction is instruction that changes in response to student characteristics or responses. This happens in response to inputs from one or more models:
- The student model. Student characteristics, e.g., students' prior knowledge, demographics, task performance, motivation, metacognition, goals, media preferences, content preferences, timing preferences, sequence preferences.
- The instructional model. The nature of instructional strategies and activities available to adapt, e.g. content, content sequencing, control of content and pace, types of activities, motivational support
- The content/expert model. What students should be able to decide and do after instruction – the learning outcomes.
All these are processed within an adaptive engine, which makes instructional decisions based on these models, and can alter events experienced by a student within a course, or even across courses/programs.
Looked at this way, the tent for “adaptive learning” is pretty broad. For example, “the adaptive engine” could include (or simply be) a teacher/faculty member – think 1:1 tutoring.
It's also clear this definition includes approaches with evidence that they make a difference (e.g., students' prior knowledge), and things without evidence – or even contrary evidence – that they make a difference (e.g., instructional sequence preferences for novice learners – see E-Learning and the Science of Instruction, 3rd edition, chapter 14). As with “chemicals,” the fact that a variety of interventions are possible, does not mean every one works in every condition.
We know 1:1 tutoring can work, so clearly adaptive instruction can work. Unfortunately, that's not a scalable option. The excitement around adaptive instruction enhanced by technology is precisely the hope that the benefits could be scalable. So is there any evidence to support this hope?
There is indeed. A few years ago I wrote a blog about a meta-analysis conducted by Kurt VanLehn to look at 100 or so studies of intelligent tutoring systems (ITSs). What he found is that a certain class of these, the ones that perform either step-wise or sub-step analysis and feedback, work to lift performance within their domains about as well as human tutoring has been shown to work at some scale (about 0.75 standard deviations (SDs), not the 2 SDs often reported based on studies reported by Benjamin Bloom that Kurt dissects).
So there is reason to think that “it can be done this way.” The challenge is still to dissect more clearly, “done what way?” This example is a bit like seeing that giving acetylsalicylic acid (active ingredient in aspirin) reduces fever – we now have an existence proof that a chemical can help with health, but it doesn't help us design treatments for other problems, or even assure us that a “chemical” is the right answer in other cases.
What do we need to try to raise the odds that we can see that an intervention in learning will really show results at scale?
Learning outcomes (LOs) that are clear and specific descriptions of what students should be able to decide and do after instruction, if they could not already perform these before. These should be linked together to build a valuable tapestry of mastery for students after extended instruction.
Valid and reliable outcome measures of these learning outcomes. This is often a problem in “at-scale” settings, where teacher/faculty quizzes and exams have been thoughtfully prepared by good domain experts, but not actually built in the way that psychometric practice shows to ensure (and document) validity and reliability, including training before item authoring, multiple expert reviews, and piloting with data collection and analysis to weed out non-working items (often at least 30% of professionally written items fall out at this stage). You do not want to be probing for mastery of Boyle's Law with a series of story problems that are mostly reading tests for students: results may not change much for anything other than reading improvement, which is not the targeted LO.
Instructional design that (at least for the hardest, most important LOs) is consistent with what is known about learning science. See Carnegie Mellon's KLI framework, or E-Learning and the Science of Instruction, or Willingham's Why Don't Students Like School, or the Deans for Impact manifesto on learning science, or a number of others – we're not lacking for evidence-based guides anymore, just the effort to engage with these, and apply them to courses at scale.
Pilot designs that really allow you to see the impact of an intervention. Sometimes you have no choice but to go with statistical controls, quasi-experimental designs, or time-sequenced comparisons (“Last quarter we did X. This quarter we did Y. Let's look at the difference.”) These always run the risk of either overstating or understating impact as a result of factors beyond your (statistical ability to) control. Especially with technology-delivered instruction which can help deliver different instruction to different students, getting some kind of RCT running is a key "learning engineering" tool to ensure you are seeing the impact (or not) of the intervention itself , not some intervening circumstances that are not visible to you.
It's complicated. As with many venture capitalists, entrepreneurs, teachers, faculty members, administrators, and policy makers, wish it were not.
Yet, why would we expect it to be simple? The brain is one of the most complex information processing devices on the planet – arguably, by itself, as complicated to understand as the rest of the body combined. Think of how the last five decades of work on human disease continues to transform our understanding of how to think about disease – even cancers, which have for so long been treated based on their anatomic occurrence in the body, might be better characterized by their genetic signatures, not their location – potentially incredibly complex. And our internal biome? Let's (literally) not go there. . . ;-)
So does this make the effort hopeless? Lengthy, maybe, requiring patience, definitely, but not hopeless. Elon Musk, setting out to build his first sports car, is suggested to have spent several $100 million in tooling to get everything set up before the first car was built – and this is not unexpected within the automobile industry, which invests year after year in new tooling for new models. He would have preferred to go faster, do it for less – but the reality of the world (and the history of prior success in the industry in carefully investing, year after year) has led this to be conventional wisdom in the automotive industry as in other industries: you need to invest a lot, smartly, before you even start to serve customers.
Building better nurses, journalists, programmers, physicists, engineers, policy-makers: this is at least as worthy (and critical for our futures) as the next car model. Let's keep at this, at scale, generating good evidence before we plunge – it matters.