Measurement in games for learning research

Note-taking for Research on Games and Simulations with Jan Plass


Kiili, K., &; Lainema, T. (2008). Foundation for Measuring Engagement in Educational Games. J of Interactive Learning Research, 19(3), 469–488.

The authors’ purpose here is to assess flow in educational games, to “operationalize the dimensions of the flow experience.” A flow state involves deep concentration, time distortion, autotelic (self-motivating) experience, a loss of self-consciousness, and a sense of loss of control.

Flow enhances learning and exploratory behavior. In game contexts, the preconditions for flow include clear goals, immediate feedback, gamefulness, and a frame story. The “gamefulness” condition means that players can use different strategies for solving problems, and can constructively use rewards gained in the game. A non-gameful game would use excessive linearity and rewards that are external to the gameplay. In Csikzentmihalyi’s original model, flow-inducing activities become spontaneous and automatic, but that’s not desirable for learning, because learning is an active and conscious knowledge construction process. (Really?) Game control should be spontaneous and automatic, but learning should be reflective.

The authors use an experiential gaming model: learning is a cyclical process based on direct experiences of the game world. The player experiments, then engages in reflective observation, which leads to schemata construction, followed by more experimentation, and the cycle continues. They measure flow via a survey with 5-point Likert-type response format, along with some open-ended qualitative questions. They are not trying to measure learning directly, but rather, participants’ feelings about learning. The Likert scale seems like an awkward way to quantify an essentially qualitative study, and indeed, the authors devote a lot of discussion to the participants’ answers to the open-ended questions.

You can measure a student’s factual knowledge, but that’s not quite the same thing as measuring learning. There’s a difference between factual learning and conceptual learning. Games do a good job of fostering the latter, but how do you measure that? Also, it may take time for learning to solidify or manifest, so testing players immediately after the gaming experience may miss it.

Plass, J.L., Milne, C., Homer, B.D., Jordan, T., Schwartz, R.N., Hayward, E.O., Verkuilen, J., Ng, F., Wang, Y., & Barrientos, J. (2012). Investigating the Effectiveness of Computer Simulations for Chemistry Learning. Special Issue on Large-Scale Interventions in Science Education for Diverse Student Groups in Varied Educational Settings. Journal of Research in Science Teaching, 49(3), 394–419.

The authors investigate whether computer simulations are an effective tool for teaching chemistry in high school science classrooms. Specifically, it assesses the scaling up of these simulations, that is, testing their wider adoption beyond a close circle of early adopters. The study considers the complexity and unpredictability of classroom environments by assessing fidelity of implementation (FOI), a measure that originated in medical research. In the context of education, it means that curriculum plans are implemented as intended by the original design.

The FOI measure helps researchers figure out whether an intervention is ineffective on its own merits, or because it isn’t carried out the right way. The authors use qualitative FOI measures from video recordings and observations to vet data for quantitative analysis.

Sometimes teachers depart from the planned implementation of a lesson following their own pedagogical judgment. Sometimes they’re forced to do so by circumstances like absent students or nonfunctioning computers. Here’s a depressing quote:

Often, teachers had to respond to the messiness of school life: Students were absent or late on a regular basis; the lab assistant was out; lessons were cancelled for events such as Career Day or a field trip. Such interruptions were observed more frequently in New York City than in Texas, leading us to speculate whether the observed lower prior knowledge of students in New York City could be attributed, in part, to this level of discontinuity. One wonders how, under these conditions, teachers achieve any level of pedagogical continuity, and students any level of learning continuity.

The FOI issue gets at the tension between two competing ideas of a teacher’s job: to deliver prescribed curricula, or to adapt them according to their own expertise. When teachers act independently, it makes it difficult to evaluate interventions like the chemistry simulation.

The authors use log files generated by students’ interactions with the software to show not just their use of the simulation, but also their metacognitive processes. I need to find out more about how that works.

Baker, R.S., & Inventado, P.S. (2014). Educational Data Mining and Learning Analytics. In J.A. Larusson and B. White (eds.), Learning Analytics: From Research to Practice (pp. 61–75). New York: Springer.

The authors discuss methods for educational data mining (EDM), including prediction models, structure discovery, relationship mining, and discovery with models. In general, EDM relies on latent knowledge estimation—we can’t directly measure knowledge; we have to infer it from students’ performance, their patterns of correctness.

Prediction models infer a predicted variable (i.e. a dependent variable) from some combination of predictor variables (i.e. independent variables.) To do prediction, you need a ground truth dataset. Prediction models are good for situations where you want to make determinations in real time, for example to see whether a student will need intervention based on their affect.

Relationship mining is a method for discovering relationships between variables in a dataset with a large number of variables. That might mean finding which variables correlate most strongly with a particular variable of interest, or to see which relationships between any two variables are strongest. Association rule mining is the process of finding if-then rules describing the relationship between variables. For example: IF a student is frustrated OR has a stronger goal of learning than performance THEN the student frequently asks for help. Sequential pattern mining looks for temporal associations between events. Correlation mining is similar, but it just looks for correlation, not causation.

Structure discovery algorithms look for structure in the data without any ground truth. For example, you can do clustering, looking for data points that naturally group together. This is helpful if you don’t know the categories you’re looking for in advance. Factor analysis is the search for variables that naturally group together, rather than data. It groups the variables by latent (not directly observable) factors.