The Vocoder, Auto-Tune, Pitch Standardization and Vocal Virtuosity

Writing assignment for History of Science and Technology class with Myles Jackson. See a more informal introduction to the vocoder here.

Casual music listeners know the vocoder best as the robotic voice effect popular in disco and early hip-hop. Anyone who has heard pop music of the last two decades has heard Auto-Tune. The two effects are frequently mistaken for one another, and for good reason—they share the same mathematical and technological basis. Auto-Tune has become ubiquitous in recording studios, in two very different incarnations. There is its intended use, as an expedient way to correct out-of-tune notes, replacing various tedious and labor-intensive manual methods. Pop, hip-hop and electronic dance music producers have also found an unintended use for Auto-Tune, as a special effect that quantizes pitches to a conspicuously excessive degree, giving the voice a synthetic, otherworldly quality. In this paper, I discuss the history of the vocoder and Auto-Tune, in the context of broader efforts to use science and technology to mathematically analyze and standardize music. I also explore how such technologies problematize our ideas of virtuosity.

Ableton vocoder

The key scientific insight underlying the vocoder (and many other music and audio technologies) is the Fourier transform, a mathematical operation named for the eighteenth century French mathematician Jean-Baptiste Joseph Fourier. Using the Fourier transform, it is possible to decompose a complex periodic waveform into the sum of simple sine waves. This procedure can also be carried out in reverse, by summing sine waves into waveforms of any arbitrary shape. The vocoder is a device for performing Fourier transforms on electrical signals, first using vacuum tubes, then silicon transistors, and now computer algorithms. It was developed in order to more efficiently digitize analog audio signals, particularly spoken words. To understand how it works, we first need to understand how to represent sound mathematically.

Sound consists of rapid air pressure fluctuations, and the human ear is a very precise tool for measuring momentary changes in air pressure. If we draw a graph of air pressure over time, we produce a sinusoidal waveform. A harmonic sound like the voice exhibits regular and detectable periodicity, whereas noise consists of random fluctuations. The ear-brain system is able to decompose the pressure/time signal into its component sine waves, and it can separate those from nonharmonic noise (Thompson, 2009). We can accomplish this feat in part because the hairs in our cochlea are particularly sensitive to vibrations in particular frequency ranges. Each hair effectively a bandpass filter. By monitoring the amplitude of each hair’s vibration, we can measure the amplitude of the sound energy within its corresponding frequency band, giving us clues as to the partials of harmonic sounds.

In the 1930s, Bell Labs researchers were studying methods for digitizing voice transmissions, which would enable multiplexing and encryption of telephone and radio signals. The simplest method for digitizing an analog waveform is pulse-code modulation (PCM), which entails taking voltage readings at regular intervals and storing them as a sequence of numbers. Taking more frequent readings using finer distinctions of voltage levels results in a more accurate analog-to-digital conversion. You can convert this digital data back to an analog audio signal by sending a series of regular pulses at the appropriate voltage to a speaker.

The problem with PCM as an encoding scheme is that it is very demanding of data storage. The compact disk standard calls for a sampling rate of 44,100 voltage readings per second, with each reading requiring sixteen bits of resolution (Pohlmann, 2010). The resulting file takes up five megabytes of storage per minute of audio, and twice that much for stereo signals. Transmission of this much data is unwieldy even using current-day technology, and was impossible in the 1930s. Therefore, it was imperative to find ways to encode voice signals using dramatically less data. The vocoder was the first such method.

Homer Dudley, a research physicist at Bell Laboratories in New Jersey, developed the earliest version of the vocoder in 1931. The name is short for Voice Operated reCOrDER (Tompkins, 2011). Rather than attempting to encode the full voice waveform, the vocoder decomposes it into its component sinusoids, and measures the amplitude of those sinusoids. (For an interactive demonstration of the mathematics of this procedure, see Schaedler, 2015.) Just like our ears, the vocoder contains a series of band-pass filters, each of which measures the amount of sound energy within a particular frequency band. The more bands there are and the narrower they are, the more accurately the voice can be encoded. Encoding the amplitude readings from each band requires only a minuscule fraction of the bandwidth of PCM.

In order to play back vocoded speech, it is necessary to synthesize a new waveform whose partials have the same amplitude over time as the source signal. Typically, the vocoder receiver synthesizes an approximation of the original waveform by band-pass-filtering a harmonic-rich sound like a sawtooth wave or white noise. This sound source is called the carrier, while the original signal providing the filter parameters is called the modulator. Just as the hairs in our cochlea are biological bandpass filters, so too do we possess biological synthesizers—our vocal chords create a “carrier” signal, which we modulate by shaping our vocal tract.

The first practical use for the vocoder was the United States military’s SIGSALY voice encryption system, adopted by Allied commanders in World War II. Each SIGSALY unit weighed 55 tons and occupied 2500 square feet of floor space. The vocoder used ten bandpass filters, each one occupying a seven-foot-tall cabinet of transformers and vacuum tubes (Tompkins, 2011). After encoding the voice signal, SIGSALY encrypted it by mixing it with a recording of thermal noise from a phonograph record specially produced by the Muzak corporation. To decode the signal, another SIGSALY unit on the receiving end had to subtract the noise signal, which required an identical record playing in perfect sync with the original.

While its early applications were in military telecommunications, the musical potential of the vocoder was apparent from the beginning. Bell Labs’ early demonstrations used human voices to modulate musical sounds to create “singing” instruments. This effect was occasionally used as a novelty for the next several decades. Notable examples include the speaking train whistle in Dumbo (1941) and Pete Drake’s talking steel guitar on “Forever” (1964). The “talking guitar” effect would become more popular in the 1970s when used by Peter Frampton. During the same decade, the vocoder became fashionable among disco and early techno artists, who used their voices as modulators for synthesizers to produce robotic voice effects. See, for example, “Europe Endless” by Kraftwerk (1977), in which the song title is “sung” by a synthesizer modulated by the voice of singer Florian Schneider.

Musicians in the present day are far more likely to encounter the vocoder in its software incarnation than as hardware. Beyond robotic voices, the vocoder is critical for digital pitch manipulation. It has been possible since the invention of the gramophone to alter the pitch of a recording by changing its playback speed, but changing the pitch and speed independently (or holding one constant while changing the other) is a technical challenge on par with independently controlling the pitch and loudness of an organ pipe (Jackson, 2006). The solution for the organ is the compensated reed pipe, and the digital audio manipulation technique is called the phase vocoder. The phase vocoder algorithm breaks up a signal into short bursts called windows. It then does a Fourier transform on each window (Sethares, n.d.). We can think of the phase vocoder as a vocoder that can change its settings every few milliseconds.

Following the adoption of digital audio editing systems like Pro Tools in the 1990s, music producers began using phase vocoder-based pitch shifting to correct out-of-tune vocals. Previously, flawed vocals had to be laboriously re-recorded or manually edited. Digital pitch shifting made the process easier, but it was still tedious. Producers yearned for a way to perform the task automatically and in real time. The problem was that phase vocoding was too computation-intensive to be performed that rapidly. Andy Hildebrand, a former oil industry engineer, found a dramatically more efficient implementation of the phase vocoder, and in so doing, inadvertently transformed the sound of popular music.

Hildebrand had been an avid musician since childhood, but he began his professional life as an engineer. He worked in the petroleum industry, using the Fourier transform to perform oil exploration for Exxon via a technique called reflection seismology. This entails producing a large sound wave in the ground, often with explosives. By measuring the reflected underground sound waves, it is possible to deduce the composition of the rock they pass through in three dimensions. After leaving Exxon, Hildebrand used his signal processing expertise to pursue his musical interests, first by devising a realistic string synthesizer, and then for automated pitch correction (Antares, n.d.). Using his fast and computationally efficient algorithms for analyzing the complex shifts in the frequency content of the human voice, Auto-Tune is able to resynthesize singing at different pitches so quickly as to seem instantaneous.

Auto-Tune was originally intended to be an invisible, behind-the-scenes tool, and it includes a number of parameters that engineers can adjust to keep the effect from being too conspicuous. For example, there is a setting called Retune Speed, which delays the onset of pitch correction. Even the best singers can not sing exact pitches instantly; they sing an approximation of the desired pitch and then quickly adjust. If you correct this initial wobbly convergence onto the pitch, the effect is conspicuously “inhuman” (McNamee, 2010). In 1998, while working on Cher’s song “Believe,” producers Mark Taylor and Brian Rawling discovered that when they turned the Retune Speed to zero, they liked the resulting excessively perfect sound. “Believe” was a commercial hit, and it set off a vogue for the zero Retune Speed setting, which came to be nicknamed the “Cher effect” (Frere-Jones, 2008).

The applications of Auto-Tune and its various imitators go beyond simple pitch correction. Since it is possible to shift pitches by any amount, the software can also create artificial harmony in real time. Musicians can specify the desired notes using MIDI input, or the computer can simply generate it automatically according to the desired key, mode, and intervallic structure. In Chance The Rapper’s song “All We Got” (2016), the rapper Kanye West turns into a one-man choir, answered by the more organic harmonies of the Chicago Children’s Choir. It is a startling juxtaposition.

There is a long history of considering music both as a branch of art and science, dating back to Pythagoras. Music was a part of mathematics within the medieval quadrivium and fell into the purview of European natural philosophers during the Enlightenment. Musical instruments have been high technology since Ice Age people labored to make flutes out of vulture bones (Herzog, 2010). Building pianos or saxophones requires the same industrial processes as any other complex technological device: precise measurement and calibration of equipment, standardized parts, and mechanized processes (Jackson, 2006). The explosion of technology since the Industrial Revolution has affected music as profoundly as every other aspect of modern life.

In the 1860s, Helmholtz (1954) theorized that loudness, pitch, and timbre corresponded to the primary properties of color: brightness, hue, and saturation (18-19). His resolution of sound into these basic elements, in connection with a logic of resolving complex waveforms into simpler sine waves, laid an epistemological foundation for synthesis techniques. In contemporary usage, the word “synthesizer” evokes a digital keyboard keyboard or a wall of analog knobs and patch cables. But the earliest device to use the term was a mechanical one, built by Lord Kelvin in 1876 to predict the tides. It used a system of pulleys and springs to guide the motion of a pencil tracing a curve on paper, and it could synthesize simple mathematical curves into more complex harmonic waveforms (Miller 1916, 110–11). Whether it is mechanical, electronic or digital, a synthesizer is a device that produces waveforms; those waveforms could be converted into audio, but that is only one of its possible uses.

The vocoder, like its synthesizer component, is a boundary object, straddling different social and scientific domains, and serving different purposes within each domain. “Boundary objects are objects which are both plastic enough to adapt to local needs and the constraints of the several parties employing them, yet robust enough to maintain a common identity across sites. They are weakly structured in common use, and become strongly structured in individual site use” (Star & Griesemer, 1989). Other examples of boundary objects include libraries, road or topographical maps, forms, and spreadsheets. A boundary object need not be a physical thing; they can also be taxonomies or information systems. The word “vocoder” refers both to a physical object like a SIGSALY unit and an abstraction like the phase vocoder algorithm.

Many musical instruments and audio devices are boundary objects. For example, the glass harmonica was created both to demonstrate acoustical concepts and to create music, and all musical instrument design is effectively an exercise in applied mathematics (Jackson, 2006). Ernst Chladni’s bowed plates were ostensibly tools for investigating the mathematics of vibration, but Chladni’s own interest clearly had a strong aesthetic component, which also explains their popularity in science education spaces like San Francisco’s Exploratorium.

With the advent of recording and electronic music production, the convergence between music and technology has been even more complete. The most casual bedroom producer or garage band guitarist relies on advanced signal processing. Such fine manipulation of analog and digital signals “is not exactly the domain of musician, playback technology, or listener but rather exists within all three and in the interstices between them” (Sterne & Rodgers 2011, 35). A signal processing technology like the vocoder is not just a manifestation of technology mediating music; technology becomes an intrinsic component of musical expression itself.

Vocoder technology has a variety of possible musical uses, of which “robot voice” is only one. Its most culturally impactful application has been as a tool for pitch and timing correction. Contemporary popular music is tightly standardized, via rhythm quantization on the time axis and via Auto-Tune on the pitch axis. Celemony’s Melodyne software can perform both functions. While it has been possible to effortlessly perfect synthesized sounds since the advent of MIDI in the early 1980s, phase vocoding has made it easy to do the same with any recorded audio. If a note is out of tune or out of in a pop song, it will only be so as a conscious and conspicuous choice. This state of unearthly perfection required digital audio editing to attain, but it is presaged by two centuries of efforts to standardize music. For example, Bernard Logier’s chiroplast was a literal machine for applying industrial precision and standardization to the fingers young pianists en masse (Jackson 2006, 239). Is Auto-Tune a digital chiroplast, musical Taylorism run amok?

Given the current ubiquity of Auto-Tune, it is surprising to consider how recent a development it is to have a universal and uniform pitch standard at all. Twelve-tone equal temperament did not become the European standard until the early nineteenth century, and even then, concert tuning pitches varied by region. Because higher-tuned instruments sound “brighter”, there was steady pressure for each region to tune slightly higher than its neighbors, leading to a ratcheting up of concert pitch comparable to the current “loudness wars” of ever-increasing dynamic range compression. The internationalizing effects of radio finally forced the international adoption of A440, though even then France continued to persist in tuning to A435.

Tuning became somewhat destandardized again in the era of analog recording, when it became a common practice to speed up or low down tape recordings for various musical effects, thus raising and lowering their pitch. Errors in mastering and duplication have further destabilized the pitch standard in recordings. For example, “Flash Light” by Parliament (1977) is almost a quarter tone off from standard tuning, intentionally or not. Needless to say, this makes it a difficult recording to play along with. When my colleagues at Soundfly and I were designing an online music theory course and we wanted to use “Flash Light” as an example, we were obliged to upload our own version that we digitally pitch shifted back to A440.

Just as the phase vocoder algorithm enables us to adjust pitch without affecting timing, so too does it enable us to adjust timing without affecting pitch. Here, too, the universal quantization of recordings represents the culmination of a technological trend dating back several centuries. As early as the sixteenth centuries, music theorists and composers suggested standardizing tempi using the pulse, foot tapping, or clock pendulums. Pendulum-based chronometers were large and prohibitively expensive. Johann Mälzel’s small and portable metronome, introduced at the beginning of the nineteenth century, made it finally possible to impose objective tempo standards at scale.

Ludwig van Beethoven eagerly embraced the metronome, because he resented performers taking interpretive liberties with his compositions. Musicians, in turn, have resisted his attempts to standardize their tempo. Many editions of his scores have omitted the metronome markings, and performance tradition deviates widely from them (Kolisch & Mendel, 1943). Practitioners sometimes defend their resistance by arguing that Beethoven’s metronome did not work properly, or that his markings did not otherwise truly convey his intentions; sometimes, however, musicians resist simply do not feel that the indicated tempo is the most musical one (Saving, 2011).

Musical automata, the fullest aesthetic expression of the industrial revolution, presage the ubiquity of programmed and sequenced synthesizers and drum machines. Beethoven felt the same enthusiasm for automata that he did for the metronome, and for the same reason: automata can not take liberties with their instructions. Not everyone shared his enthusiasm. Hegel lamented the idea of musicians acting as mere mechanical executors of the composer’s wishes, as if they were barrel-organ grinders (Jackson 2006, 81). The German Romantics doubted that a mechanical device could ever produce a musical sound if there was no organic human spirit behind it. Like so many musical debates of that era, this one also continues unabated in the present. The bassist in one of my jazz groups complained that hip-hop sounded to him like James Brown played by robots, a criticism I have heard echoed by many instrumentalists. I address the relationship between pitch correction, musical skill and expression in the following section.

Early nineteenth century European “popular” music (in this context, meaning opera and chamber music) turned virtuoso instrumentalists into the equivalent of rock stars (Jackson, 2006). Bourgeois audiences were enraptured by the fast and flashy playing of Paganini, favoring his sheer mechanical skill over the erudite sensitivity of Romantic tastes. The cult of the virtuoso aligned well with the sensibilities of industrial capitalists. If the definition of a virtuoso is “the most technically accomplished player” rather than “the most beautiful player,” then virtuosity becomes quantifiable and measurable. Technical virtuosity also has the added benefit of making for a good show. Franz Liszt was the first pianist to turn the instrument on stage so that the audience could see his hands flying over the keys, to make music for the eyes as well as the ears. Technology has changed popular music a great deal, but there is still a broad audience for virtuosity, whether it comes in the form of a bebop saxophonist or a heavy metal guitarist. But as we shall see, technological advances pose challenges to the concept of musical skill.

Herbie Hancock’s music of the late 1970s represents an interesting technological turning point for virtuosity. Hancock is one of the most revered jazz pianists and keyboardists of all time, but he is not a strong singer. On his song “I Thought It Was You” (1978), Hancock uses the vocoder to modulate a keyboard synthesizer, thereby creating a smooth, perfectly pitched vocal with nuanced vibrato and complex harmonies. In other words, Hancock creates vocal virtuosity using his manual virtuosity.

Hancock’s 1983 hit “Rockit” uses the vocoder as the basis for manual skill of a different kind. His vocoded singing here is more of a percussive and textural sound effect punctuating the main melody, a wordless synthesizer instrumental. The true vocoded “lead vocal” in “Rockit” appears via the turntablist Grand Mixer DST, performing the first turntable solo on a jazz record. Grand Mixer DST scratches a record called “Change the Beat” by Beside and Fab Five Freddy (1982), which concludes with Fab Five Freddy’s manager speaking the words “Ahhh, this stuff is really fresh!” through a vocoder modulating white noise. In the turntablist’s hands, the word “fresh” becomes a serpentine twist of sibilance, occupying a musical space somewhere between singing, an instrumental solo, and percussion.

Daft Punk demonstrates technological virtuosity of a different kind on “Harder, Better, Faster, Stronger” (2001). The song’s vocoded melody leaps up and down over wide intervals spanning multiple octaves in a way that would likely be impossible even for the most technically accomplished singer. The synth part carrying the melody might be playable by a keyboard virtuoso, but Daft Punk most likely sequenced it using MIDI. While it is possible that they performed the melody at a slow tempo, correcting or quantizing any misplayed notes, it is more likely that they simply drew the melody into the piano roll using a mouse. This is a good example of “playing the studio,” a skill set that resembles composition and audio engineering more than instrumental performance (Hein, 2017).

Producing a melody with the vocoder requires the ability to either perform or sequence a synthesizer. No such ability is necessary to use Auto-Tune, which converts the most casually sung or spoken performance into a flawless melody. Many musicians are therefore uncomfortable with it, or even outraged by it, because this effortlessness undermines the idea that in-tune singing requires an ability that previously had to be earned through practice. In the world of digital manipulations like Auto-Tune, anyone can sound like a “virtuoso,” so what place is left for “genuine” virtuosity? This question is a moral one as much as anything, particularly in the United States, with our strong Protestant work ethic. The words “virtue” and “virtuosity” share a common root, the Latin virtus, connoting moral excellence and manly courage (McDonnell, 2006).

Is Auto-Tune necessarily unvirtuous? The singer Neko Case argues that it is:

I’m not a perfect note hitter either but I’m not going to cover it up with auto tune [sic]. Everybody uses it, too. I once asked a studio guy in Toronto, “How many people don’t use auto tune?” and he said, “You and Nelly Furtado are the only two people who’ve never used it in here.” Even though I’m not into Nelly Furtado, it kind of made me respect her. It’s cool that she has some integrity (quoted in Dombal, 2006).

Case decries the assistive use of Auto-Tune, its intended purpose as a corrective for poor singing. But not all Auto-Tune usage is shameful or secretive. The Cher Effect is an overt, “explicit” use (Strachan, 2017), quite different in its effect from “natural-sounding” pitch correction. Rappers have embraced the effect in keeping with their broader spirit of defiance, which “takes pleasure in aggressive insubordination” (Rose 1994, 80). Chris Rock and Tracy Morgan satirize this defiant stance in their performance of Simon and Garfunkel’s “Scarborough Fair” (accompanied by Paul Simon himself) during Comedy Central’s Night of Too Many Stars telethon (2010). After Simon criticizes Rock and Morgan’s inept harmony singing, Rock responds, “Let’s do some Auto-Tune.” They then sing the song with full Cher Effect, and Rock says, “Oh yeah, we nailed that sh—” to Simon’s mock chagrin.

In its explicit usage, Auto-Tune’s effect goes far beyond quantizing pitches. It radically changes the timbre of the voice and changes its musical content. By putting audibly discrete “stairsteps” into pitch slides and melisma, Auto-Tune introduces a new rhythmic element. A quick fillip to a neighboring chord tone that would normally pass unnoticed by singer and listener alike suddenly takes on dramatic musical significance when exaggerated by Auto-Tune. When vocalists deliberately sing pitches that fall between the “permitted” notes, the algorithm becomes uncertain as to what the intended note is, and fluctuates wildly between the two closest choices. This exaggerated warbling, what Frere-Jones (2008) calls “the gerbil,” no longer reads as smoothing out the voice, but rather introduces a new kind of aggressive roughness.

Often the use of Auto-Tune in R&B and hip-hop is less about digital perfection in pitch than a highlighting of, and engagement with, identifiably ‘black’ vocal traits. The extreme use of Auto-Tune is often employed to work against conventions in vocal delivery. In the digitized melisma, Auto-Tune works to clash against vocal flourishes in order to create a kind of digital growl (Strachan 2017, 157).

African-American vernacular music is not the only form that uses extensive pitch sliding. Melisma is the cornerstone of North African raï and Berber singing as well. These communities have embraced Auto-Tune as enthusiastically as American rappers and R&B singers. Cheba Djenet’s “Lkit li Nebghih” (2013) demonstrates the considerable skill required to flutter around the notes in a way that maximally activates the warble (Clayton, 2009).

Technology is never purely technical. It is always comes enveloped in cultural epistemologies, local and culturally specific ideas about the proper way to communicate scientific and technical ideas. Auto-Tune appears to threaten the “artisanal knowledge” (Jackson, 2004) of singers by applying an impersonal technology with roots in industry. Pitch correction software is a cornerstone of the digital perfectionism pervading mainstream pop production (Strachan, 2017). Does this mean that it has removed an essential piece of the human artistry of singing? Or does it simply shift the locus of artistry to vocoder sequencing or melismatic warbling?

The recorded voice has always been mediated through technology. Early recordings were low-fidelity and noise-laden. As fidelity has improved, it has become conventional to enhance recorded sound using compression and equalization. These effects are so prevalent in music that unprocessed voice recordings sound stranger and less natural than processed ones to most listeners (Sterne & Rodgers 2011). Younger listeners find nothing remarkable about extreme dynamic range limiting or lossy audio compression codecs. Millennials have never experienced a world without Auto-Tune. The Cher Effect has taken on particular emotional and cultural connotations in pop: it suggests youth, euphoria, and optimism (Strachan, 2017). We can draw an analogy between “incorrect” Auto-Tune use with the sound of distorted, overdriven tape recorders and amplifiers. Just as distortion infused analog technology with youthful expression in the rock era, so too does extreme phase vocoding act as a “contemporary strategy for intimacy with the digital… a duet between the electronics and the personal” (Clayton, 2009).

Auto-Tune tends to smooth out the individual characteristics of the voice, much like airbrushing does in photography (Strachan, 2017). Some producers embrace this homogenizing effect and push it to extremes. On his song “Lost In The World” (2010), Kanye West layers the singing of Justin Vernon, Charlie Wilson, Kay Fox, Tony Williams, Alicia Keys, Elly Jackson and himself into an Auto-Tuned mass of sound, making all of these very different singers indistinguishable, but also universal. By eliminating our flaws, Auto-Tune obscures some of our humanity. But by making it impossible for us to sing anything “wrong,” the software also invites playfulness and confidence. Auto-Tune can oppress us or liberate us creatively; how we use it is up to us.


Antares Audio Technologies. (2017). Antares Vocal Processing > About Us > Dr. Andy. Retrieved October 9, 2017.

Clayton, J. (2009, May). Pitch Perfect. Frieze Magazine.

Clayton, J. (2016). Uproot: Travels in 21st-Century Music and Digital Culture. New York: FSG Originals.

Dombal, R. (2006). Neko Case. Retrieved October 9, 2017.

Frere-Jones, S. (2008, June). The Gerbil’s Revenge. The New Yorker, 128–129.

Hein, E. (2017). Playing (in) the digital studio. In S. A. Ruthmann & R. Mantie (Eds.), The Oxford Handbook of Technology and Music Education. New York: Oxford University Press.

Helmholtz, H. von. (1954). On the Sensations of Tone as a Physiological Basis for the Theory of Music. (A. J. Ellis, trans.) (2nd ed.). New York: Dover Publications.

Herzog, W. (2010). Cave of Forgotten Dreams. United States: IFC Films.

Jackson, M. W. (2006). Harmonious triads : physicists, musicians, and instrument makers in nineteenth-century Germany. Cambridge, MA: MIT Press.

Kolisch, R., & Mendel, A. (1943). Tempo and Character in Beethoven’s Music–Part I. The Musical Quarterly, 29(2), 169–187.

Lavey, N., & Kang, J. C. (2014). Object of Interest: the Vocoder. United States: The New Yorker.

Marshall, W. O. (2017). Tuning in Situ: Articulations of Voice, Affect, and Artifact in the Recording Studio. Cornell University.

McDonnell, M. A. (2006). Roman manliness : virtus and the Roman Republic. Cambridge University Press.

McNamee, D. (2010). Hey, what’s that sound: Auto-Tune. Retrieved October 27, 2017.

Miller, D. C. (1916). The Science of Musical Sounds. New York: The Macmillan Company.

Pohlmann, K. C. (2011). Principles of digital audio. McGraw-Hill.

Rock, C., Morgan, T., & Simon, P. (2010). Night of Too Many Stars. United States: Comedy Central.

Rose, T. (1994). Black Noise: Rap Music and Black Culture in Contemporary America (1st ed.). Hanover, N.H.: Wesleyan.

Saving, M. (2011). How Fast Shall We Play? Retrieved October 26, 2017.

Schaedler, J. (2015). Seeing Circles, Sines and Signals – A Compact Primer on Digital Signal Processing. Retrieved October 18, 2017.

Sethares, W. (n.d.). Phase vocoder in Matlab. Retrieved October 9, 2017.

Sharpsteen, B. (1941). Dumbo. United States: Walt Disney Pictures.

Star, S. L., & Griesemer, J. R. (1989). Institutional Ecology, `Translations’ and Boundary Objects: Amateurs and Professionals in Berkeley’s Museum of Vertebrate Zoology, 1907-39. Social Studies of Science, 19(3), 387–420.

Sterne, J., & Rodgers, T. (2011). The Poetics of Signal Processing. Differences, 22(2–3), 31–53.

Strachan, R. (2017). Sonic Technologies: Popular Music, Digital Culture and the Creative Process. London: Bloomsbury Publishing.

Thompson, W. F. (2009). Music, Thought, and Feeling : Understanding the Psychology of Music. Oxford University Press.

Tompkins, D. (2011). How to Wreck a Nice Beach: The Vocoder from World War II to Hip-Hop, The Machine Speaks. New York: Melville House.


Beside and Fab Five Freddy (1982). Change the Beat (Female Version) [12” single]. Paris: Celluloid. (1982)

Chance the Rapper (2016). All We Got. On Coloring Book [streaming]. Self-released. (May 13, 2016)

Cher (1998). Believe. On Believe [CD]. New York: Warner Music Group. (October 22, 1998)

Daft Punk (2001). Harder, Better, Faster, Stronger. On Discovery [CD]. London: Virgin. (October 13, 2001)

Djenet, Cheba (2013). Lkit li Nebghih. On Ilabesah Omri Tebghini [streaming]. Saint Crépain. (June 21, 2013)

Drake, Pete (1964). Forever. On Forever [LP]. United States: Smash Records. (1964)

Hancock, Herbie (1978). I Thought It Was You. On Sunlight [LP]. New York: Columbia. (June 15, 1978)

______ (1983). Rockit. On Future Shock [LP]. New York: Columbia. (August, 1983)

Kraftwerk (1977). Europe Endless. On Trans-Europe Express [LP]. Düsseldorf: Kling Klang. (March, 1977)

Parliament (1977). Flash Light. On Funkentelechy vs The Placebo Syndrome [LP]. New York: Casablanca. (November 28, 1977)

West, Kanye (2010). Lost In The World. On My Beautiful Dark Twisted Fantasy [CD]. New York: Roc-A-Fella/Def Jam. (November 22, 2010)

One thought on “The Vocoder, Auto-Tune, Pitch Standardization and Vocal Virtuosity

  1. Great read! One note though: Pete Frampton used a talk box (similar result, completely different underlying, acoustic, technology ) and I’m pretty sure some of the other early examples used a speaker attached to the operator’s throat in a talk box-like fashion. In other words: are you sure the bona fide electronic vocoder ventured outside research facilities and into musical applications before the mid-sixties?

Leave a Reply