Image schemas in music software

I’m doing a ton of writing for grad school, and will be posting the highlights here. First off, an abstract and discussion of this article:

Katie Wilkie, Simon Holland, and Paul Mulholland. Winter, 2010. What Can The Language Of Musicians Tell Us About Music Interaction Design? Computer Music Journal, Vol. 34, No. 4, Pages 34-48

The authors discuss the ways that user interface design for music production and teaching software is informed by embodied cognition, as articulated by George Lakoff and Mark Johnson in their book Metaphors We Live By. Lakoff and Johnson argue that all metaphors trace their roots to states of the human body, which are the only basis for abstract thought that we possess. The closer a metaphor is to a state of the body, the easier it is for us to understand.

In music, the most obvious bodily metaphors are rhythm and repetition, which we experience throughout the sensory world, not just in music. We also use a variety of spatial metaphors for music, referred to by the authors as image schemas. Listeners commonly conceive of music using images of containers, cycles, verticality, balance, the notion of center-periphery, and (in the case of western melodies) a narrative of source-path-goal.

An example of the container schema is the statement “Bb is in the key of F.” We imagine the key of F as a container, with Bb as one of its contents. We think of chords as being stacked vertically, like a pile of bricks. When we conceive melodies, we think of the line going for a metaphorical walk, with altitude standing in for pitch: “The melody starts on F, goes up to Bb, down to A, and then lands back on F.” (However, the “pitch-as-height” metaphor is muddied by the circularity of pitch class, and by the fact that we feel ascending pitch movement differently from ascending.) We may use alternative image schemas; that higher pitches are brighter, and lower pitches are darker. We are on stronger footing with the notion of the tonic as “home base” — we imagine a piece that modulates through different keys as going out on a journey and then returning home.

People approach software equipped with bodily image schemas, learned and innate. The highest praise one can give to an interface is that it is “intuitive.” The authors define an intuitive interface as one that allows the user to apply prior knowledge and existing image schema: innate, sensory-motor, embodied, cultural, or expert.

The authors evaluate two software programs in terms of their intuitiveness, or lack thereof. The first is Harmony Space, a program written by one of the paper’s authors to “systematically and richly designed to exploit spatial metaphors for harmonic concepts.” (Unfortunately, this software is no longer available online, aside from low-resolution screenshots.) Harmony Space organizes the diatonic pitches onto a grid with the topology of a torus, organized by Euler’s tonnetz scheme. This organization helps users understand harmony in terms of spatial proximity. In Harmony Space, adjacent notes form diatonic thirds and triads. Chords and scales form distinctive geometric shapes. The user can transpose chords and other patterns by simply moving the shapes around on the grid. While this is an elegant didactic tool, it is only partially useful. By design, Harmony Space totally neglects rhythm. The authors discuss the difficulty of designing a visualization scheme for rhythm that is as elegant as the tone grid.

The other software program evaluated is Apple’s Garageband. Since Apple includes it for free with Macs, Garageband has become widely used by amateurs. It is a simplified version of Logic, using the same multitrack tape recorder metaphor as most other DAWs. This metaphor is not immediately intuitive, but it is easily learned — users quickly learn to imagine a chorus of voices, with each voice occupying its own horizontal track. The left-to-right timeline is also immediately intuitive once the user sees it in action. Garageband adds an appealing loop library to the basic recording functionality. The loops can be altered by the user in a full-fledged MIDI editor.

The authors praise Garageband for its combination of versatility and accessibility, but they miss some of the program’s shortcomings as a tool for beginner self-teaching. Garageband offers many attractive-sounding loops and instrument sounds, but offers no suggestion as to how to make good musical use of those materials. It does not suggest, for example, that by western pop tradition, loops sound best when repeated two, four, eight or sixteen times. Also, it makes no attempt at showing harmonic relationships; users are left to trial and error to find musical chord/scale combinations. Ideally, Garageband’s MIDI editor would suggest to the user which notes would actually sound good, perhaps by coloring chord tones green, extensions yellow and dissonant notes red.

Garageband and Harmony Space are intriguing, but surely better visual metaphors for music have yet to be implemented. For example, while the “container” for chords is intuitive, it is also misleading, since the chord is comprised of tones, not a box for them. A better image would be tones as atoms and chords as molecules built from those atoms, which gets at their relational nature better. As the molecule becomes a more familiar image, it will become available as an “intuitive” image schema.

I anticipate that the next generation of beginner-oriented production software will draw not on the tape recorder metaphor, but on the sampler. I could imagine simplified version of the Session View in Ableton Live, allowing the user to build songs out of musical “legos,” dragging and dropping in real time.

See also a post collecting my favorite music visualization systems.