Big data is big news. Some might think the arrival of very large education-related data sets spells the end of our education troubles – that by analyzing this data, all necessary new information to accelerate learning will be revealed.
We have barely begun to explore how this can help learning at scale. In an earlier blog, I mentioned the intelligent tutoring system meta-analysis of Kurt VanLehn, showing that more sophisticated such systems overall seem to be reaching human levels of tutoring at scale. New large data sets, whether from the MOOCs that are springing up by Udacity, EdX, Coursera, and others, or about to emerge from things like the Common Core assessment tools, should create remarkable opportunities to improve performance further. The MOOCs haven’t even begun to touch this opportunity, but they all say they plan to – and their economics, for the skills they can actually provide mastery for, are likely to transform education.
So the issue is not whether such large-data approaches are valuable. They are. However, we need to be careful not to assume that all possible progress can be made just by using these models and analyses.
A very nice recent article by Steve Lohr in the New York Times gets right to the point: “The problem is that a math model, like a metaphor, is a simplification.” We have to be careful that we don’t assume the model is the learner: the world works the way it does, whether it fits a model at the edges or not. For example, in health, you would not expect that an immune system would be designed to allow attacks on its own body – that would be nuts! Unfortunately, our immune system does so, whether we think it should or not, so our approaches to disease have to take this into account. In the same way, we might wish that learners can be categorized by their “learning styles” as, say, visual, kinesthetic, auditory, etc. and design models assuming this and working from this – but the evidence so far suggests this is not how our learning systems seem to work.
Most domains benefit from several different approaches to innovation and their overlaps. For example, in medicine, you may indeed make progress by going to a jungle, collecting large numbers of plant, animal, and soil samples, bringing them back to a lab, and running all that you have found over plates of detector molecules that are related to various diseases of interest. Anything that “sticks” to any molecule of interest can then become something to investigate further.
But you also make progress, and can yield different answers, by deeply investigating the mechanisms of normal and diseased function, looking closely at the molecular mechanics involved, and then hypothesizing new molecules or other interventions that “get in the way” of the detailed disease mechanics, while not blocking normal functioning.
Both types of work are valuable – very different in character and analyses, but able to yield different solutions.
The overlaps are needed too: for example, in the medical example above, the plates of receptors are not random molecules – they were discovered somewhere as relevant to disease processes, most likely through extensive, detailed research, not a raw, high-level, modeling exercise.
Something similar seems right for education. There is much already known about what works and does not work for learning (e.g., see E-Learning and the Science of Instruction by Rich Mayer and Ruth Clark, or Kaplan’s first unit of our internal training program for instructional designers, focusing on identifying elements that improve learning in any learning environment), and this should inform whatever we do, either as practitioners or researchers. However, there is plenty more detailed work that can be done to understand in more detail the specifics of learning: learning to write is quite different than learning math, so specific information about progressions and patterns of mastery, practices that work better or worse for certain learners, and more can be usefully teased out through systematic experiments based on what’s already known for writing or for math.
We could imagine proceeding by simply ignoring what we’ve already found out, and just trying every possible idea about blending media, text, structure, pace, with every learner we come across – after all, “the data will tell us the truth” we might think. However, this is wildly inefficient: the combinatorics of randomly mixing all the different possible variables, approaches, and media types for every small segment of instruction explode the costs and time required. It would be as if we decided to find new treatments for diseases by simply giving every possible molecule we can create to every possible patient we can find – it simply cannot work out (not to mention the moral implications of not using what we already know works) without some form of filtering of the interventions. That filtering comes from what has already worked – and new hypotheses about what might work based on these.
The overlaps will be enormously valuable. In addition to using the data to personalize learning as it occurs, big data can be mined to find new hypotheses, based on what we’ve already established works – that’s one of its strengths. Another critically important use of big data is to test hypotheses, however they were generated, either by previous data mining, or by extending other strong results from experiments. We can discover much more quickly under what conditions a new hypothesis for a learning intervention is successful or not successful. Indeed, here at Kaplan, we are setting up “pilot pipelines” – organized ways to use large numbers of sections for the same course running at the same time to help us untangle “what works” and doesn’t and why.
So big data will be critically important to make progress in learning as it has in so many other fields – but let’s not forget there’s critical interplay between large data sets “in the wild”, statistical model building, and what we learn and hypothesize from tightly-constructed smaller scale experiments probing the details (both affordances and challenges) of human learning.
Comments
You can follow this conversation by subscribing to the comment feed for this post.