Food Opera: Merging Taste and Sound in Real Time

The potential to link sound to food, scent, and the tactile sensations of the mouth creates an entirely new field of sensory interplay, which may be harnessed to a wide range of expressive ends. Approaches include theatrical narrative structures that might tell a story, spatial or landscape meditations that might resemble a sound installation, and ritual events such as a Passover Seder or a wedding ceremony.

Written By

Ben Houge

[Note: Portions of this article are borrowed from my paper “Food Opera: A New Genre for Audio-Gustatory Expression,” co-authored with Jutta Friedrichs (producer of our food operas), to be presented at the Second International Workshop on Musical Metacreation, and are reproduced here with gratitude.]

Beside the White Chickens: A Summer Food Opera (excerpt) from Ben Houge on Vimeo.


Since 2012, I’ve co-produced three audio-gustatory events that I decided to call “food operas.” For each of these events, I composed a real-time generative soundtrack to accompany a multicourse meal, designed by chef Jason Bond of Bondir restaurant in Cambridge, MA. This music responds to events cued by diners, drawing on event-driven scoring techniques adapted from my work in the video game industry since 1996. Each table in the restaurant is outfitted with speakers, one for each diner, totaling thirty channels of coordinated, real-time, algorithmic, spatially deployed sound in all.

The concept is predicated on the acknowledgement of dining as a time-based art form, akin to film, dance, and music. I use the term “opera” for its multimedia associations, describing a hybrid, interdisciplinary work that engages and explores the junctures between multiple senses.

I identify the main innovations in these food operas as follows:

• Real-time scoring techniques adapted from the field of video game music create a custom soundtrack that follows unpredictable, live input from diners.
• Associating sound with taste at the level of texture (somewhere between an individual note and a complete musical phrase) allows for a heterophony in which the two senses can be closely linked while still maintaining a degree of autonomy.
• Coordinating sound across a massively multichannel speaker array allows us to preserve the interpersonal and social aspect of communal dining while also elaborating a large-scale sonic environment. (I spent some time debating whether thirty channels could be described as “massively multichannel,” but I decided in the end to keep the term, as it’s more than what can be deployed from a single computer using a single audio interface, more than you typically find in a movie theater, and far more than a typical home listening environment.)

Each of these innovations on its own may represent a small step forward, but collectively, they allow for what we (producer/artist Jutta Friedrichs, chef Jason Bond, and I) consider to be a new genre of artistic production: the food opera.

One reason we consider food opera to be a new genre is that its expressive potentialities are so vast. The potential to link sound to food, scent, and the tactile sensations of the mouth creates an entirely new field of sensory interplay, which may be harnessed to a wide range of expressive ends. Approaches include theatrical narrative structures (including non-linear or generative narratives) that might tell a story, spatial or landscape meditations that might resemble a sound installation, and ritual events such as a Passover Seder or a wedding ceremony.

As an example, in our second and third food operas, alongside purely musical elements, we incorporated field recordings and interviews from some of the local, organic farms that provide Bondir’s ingredients. Our goal was to tell the story of sustainable food practices by sonically connecting diners to the sources of their meals. We feel that this message was conveyed more deeply and memorably than it would have been using sound or taste alone.

Through the food opera project, we hope to show how the application of real-time music generation techniques, originally developed for use in the virtual worlds of video games, can increasingly be applied to score the unpredictable, real-time phenomena of our everyday physical environment, which we observe to be a rapidly growing arena of cultural production.


The chef and the composer. Photo by Melissa Rivard/Andrew Janjigian.

History and Previous Work

The impetus to associate eating with music goes back to our earliest human history; in myriad historical documents, music has been present at feasts, banquets, rituals, and celebrations. Since the development of the restaurant, performers have often been present to provide background music for diners. And following the advent of recorded sound in the twentieth century, recorded music, from jukeboxes to iPods, has increased in prominence in restaurants to the point of ubiquity. But in all of these situations, music, by the very means of its production, has been physically distanced from the table and at best only coarsely synchronized with the meal, with none of the sophistication of interplay between music and other art forms to which we are accustomed in a ballet, a film, or an opera. (For a lively and detailed overview of historical pairings of food with sound, check out Qian [Janice] Wang’s freshly minted MIT master’s thesis: Music, Mind, and Mouth: Exploring the Interaction Between Music and Flavor Perception.)

In recent years, there has been an increase in artistic attention to the sense of taste, while at the same time chefs have been making forays into the art world. Perhaps the most famous milestone in this rapprochement was Spanish chef Ferran Adrià’s “G Pavilion” at the Documenta 12 art exhibition in 2007. Other touchstones include Natalie Jeremijenko and Mihir Desai’s Cross(x)Species Adventure Club and renowned French pastry chef Pierre Hermé being invited to lecture at the Harvard Graduate School of Design.

Examples of food-based experiences that incorporate sound are more rare. The most well-known example is perhaps The Sound of the Sea, developed at Heston Blumenthal’s Fat Duck restaurant in Bray, England; this dish is served with an iPod in a conch shell allowing diners to hear a field recording of ocean surf on headphones while they dine. A similar approach is used in Volcano Flambé, developed by chef Kevin Lasko and artist Marina Abramović at Park Avenue Winter in New York, which was accompanied by a recording of the artist describing the ice cream and merengue-based dessert and thanking diners “for eating with awareness.” Paul Pairet, a French chef based in Shanghai, opened a new restaurant named Ultraviolet in the summer of 2012, which seats only ten diners per night, and each of the twenty-two courses is accompanied by video projections on the walls and a soundtrack of previously-composed music. (The Beatles’ “Ob-la-di, Ob-la-da” accompanies a riff on traditional English fish and chips, for example.)

While there has been an increase in art world interest in eating in the last century, much of it ignores the sensory nuance of taste. I dismiss as irrelevant to the current discussion art that incorporates food for purely visual or conceptual effect (e.g., Luciano Fabro’s Sisyphus, Paola Pivi’s It’s a Cocktail Party), performance work that involves eating without an exploration of the sensation of taste (Yoko Ono’s Tunafish Sandwich Piece, Alison Knowles’ Identical Lunch, Emily Katrencik’s Consuming 1.956 Inches Each Day For 41 Days), or merely visual representations of food (Claes Oldenburg’s Baked Potato, Francesc Guillamet’s photographs of dishes from El Bulli).

Small Asparagus Opera

Diners waiting in anticipation at the first food opera. Photo by Melissa Rivard/Andrew Janjigian.

My food opera work has its genesis in conversations and sketches dating back to 2006, including a workshop in Shanghai in 2010. The first public presentation of these ideas, and also my first collaboration with chef Jason Bond, was entitled “Food Opera: Four Asparagus Compositions” and took place in May 2012 at Harvard’s Graduate School of Design, as part of a food-oriented series curated by Jutta Friedrichs, Elizabeth MacWillie, and Sara Hendren, then students in the GSD’s Art, Design and the Public Domain program, headed by professors Sanford Kwinter and Krzysztof Wodiczko. The second event, entitled “Sensing Terroir: A Harvest Food Opera,” took place in November 2012 at chef Jason Bond’s Bondir restaurant in Cambridge, sponsored by Artists in Context; this event incorporated field recordings and interviews with local, organic food providers and sought to explore dining as a narrative form to tell the story of sustainable food practices. The third event, entitled “Beside the White Chickens: A Summer Food Opera,” took place in July 2013 at Bondir and highlighted chef Bond’s local, organic poultry providers, Pete and Jen’s Backyard Birds of Concord, MA, taking as its inspiration the William Carlos Williams poem, “The Red Wheelbarrow,” around which sous chef Rachel Miller organized the menu.

Disclaimer: renderings of individual textures are interspersed throughout this article. Please note, however, that at a food opera event, none of these would be heard in isolation, but they would merge in space with twenty-five other textures at various stages of deployment to create a dense, rich soundscape.

Space and Massively Multichannel Sound

Pam Joy and Margaret Experiencing Food Opera

Diners enjoying Chicken Galantine au Foin, Lions Mane Mushrooms, White Sage Peach Confit, Broccoli Leaf and Rhode Island White Flint Corn Flour Polenta, Charred Eggplant, Snow Bok Choy, and Cherry Tomatoes in Tomato Honey with corresponding music. (Note the placement of the speakers on the table.) Photo by Jutta Friedrichs.

An overriding goal throughout this project is to respect the integrity and the history of the experience of communal dining: the focus is at the table, not on a stage situated elsewhere in the dining environment. Diners are free to converse during the event. To quote Brian Eno, the music should be “as interesting as it is ignorable,” a statement that also parallels chef Jason Bond’s observation of the manner in which people enjoy his food. The sound is there to enhance and supplement the meal, not to supersede or distract from it. In our conception of food opera, the very physical presence of a live performer alters the calculus of the dining experience to an unacceptable degree; this is why we rely on recorded sound for our event. (However, because the sound is being deployed in real time, it exists, as all video game soundtracks do, somewhere between recording and performance. The combination of video game music techniques and multichannel sound diffusion are the technologies that enable food opera as we define it to exist.)

In our second and third events at Bondir restaurant, each diner has a dedicated speaker through which she or he hears the accompaniment to her or his meal. The small speaker, about 3 inches in diameter, takes the place of a traditional table centerpiece (e.g., a vase of flowers) and rests in a small speaker stand designed by Jutta Friedrichs, which allows two speakers pointed in opposite directions to be directed at two diners facing each other. Bondir seats twenty-six diners, so twenty-six channels of sound are deployed simultaneously to individual diners throughout the restaurant. Six additional speakers (four around the perimeter of the room plus two custom hemisphere speakers, designed by Stephan Moore, positioned on stands in the middle of the room) play a slowly evolving low frequency drone that provides a harmonic context for the musical textures of each diner’s discrete accompaniment.

We do not use any particularly focused or directional speaker technology, and in fact, it is not particularly desirable for our planned experience. Sounds from nearby speakers will naturally blend together, providing a rich field of music, in effect transforming the entire restaurant into a generative sound installation. At our second and third events, diners could make reservations anytime between 5pm and 9:30pm, such that, at any given time, different people in the restaurant would be at different points in their meals; this is why the harmonic and rhythmic coordination described above is particularly important. The overall form of the piece is an aggregate of every table’s individual dining trajectory, which results in an emergent foreshadowing and recapitulation as other diners in the restaurant order the same dishes at different times.

Mimesis and Abstraction

The primary sound sources in our first three events are traditional, acoustic musical instruments: flute, clarinet, bassoon, viola, cello, accordion, a tourist nyatiti from Kenya, a Chinese hulusi, and miscellaneous percussion. This may seem an unnecessary restriction, as the sound is played through speakers, allowing any electroacoustic sound to be employed. We used traditional musical instrument sounds to create an association with the alchemy of cooking: recognizable physical materials are juxtaposed, manipulated, and transformed to an aesthetic end. The sample manipulations we employ are fairly straightforward (e.g., splicing, transposing, applying amplitude envelopes), so that the acoustic source material remains clear.

There is an essential abstractness that is shared by food and music (distinct from visual art and writing, for example), which we sought to underscore. One of the original inspirations for this whole project was to explore the idea of using sound to describe taste, bypassing words entirely. It is difficult for the taste of asparagus, like the sound of a cello, to be “about” anything other than itself. As in dining, mimesis in acoustic music is by far the exception (e.g., the thunder at the end of Berlioz’s “Scène aux Champs” from Symphonie Fantastique).

In addition to these abstract sounds (abstract in the sense of an abstract photograph) intended to evoke or compliment the sensory experience of each dish, the second and third food operas also incorporated field recordings and interviews from the farms that supply Bondir’s ingredients. With the field recordings, the goal was to get people thinking about where their food comes from, using sound to connect diners at Bondir to the farms on which the ingredients for the meal were produced. And in the interviews, diners could hear directly from farmers stories about the labor that goes into producing organic produce, the challenges of sustainable farming, and the histories of obscure heirloom vegetables (such as the Waldoboro green neck turnip).
Interviews accompanied only some of the courses, and the ambient sound served primarily as buffer or interstitial behavior between courses. (After we finished our setup and sound check, but before the first guests arrived, the room was filled with 26 channels of chickens clucking, which was a pretty fantastic sound, and remarkably evocative of the experience of walking on Pete and Jen’s farm.) But since different diners are on different courses at the same time, these three modes of listening (abstract [or conventionally musical], mimetic, and linguistic) were almost always active somewhere in the space and contributed to the richness of the experience.

Video Games in the Restaurant

As video games are inherently organized around the unpredictable input of players, video game music has evolved to be uniquely sensitive to real-time input. In a video game, any event may be associated with a musical parameter, and this concept of mapping between two different types of data is at the core of video game music, and indeed, of digital art in general. These mappings have evolved to be quite sophisticated and can be thought of as the points of convergence between two autonomous but linked systems.
The challenge of composing music for real-time deployment may be summarized as follows: deciding what to do when an event is received (typically the managing of a musical event or transition), and what to do for the indeterminate amount of time between events (typically some kind of continuous or state-based musical behavior). The simplest solution, and historically the most prevalent, is to simply loop a piece of music indefinitely until an event is registered, at which point the music fades to silence over a certain amount of time, while a new piece of looping music may fade in. The disadvantages to this approach, however, are numerous, including tedious repetition and inelegant transitions.

Many more sophisticated techniques have been developed, and while it is beyond the scope of this article to present a full taxonomy, the techniques that we have incorporated into our food opera work to date include the coordination of musical events along a metric timeline, transposition of MIDI-like note data, and the algorithmic generation of musical melodies, pitch aggregates, rhythms, and phrases. In some aspects, our focus on notes and short phrases recalls the DirectMusic system released by Microsoft in 1999. It also draws from work by aleatoric composers of the mid-twentieth century, including John Cage, Earle Brown, and Christian Wolff. (For more information, see my paper Cell-Based Musical Deployment in Tom Clancy’s EndWar, presented at the First International Conference on Musical Metacreation in 2012.)

In our food operas, the sources of real time events are constrained to a fairly limited range: we observe which dishes are chosen by a diner, when each course arrives, and when it is finished. The number of courses is fixed (four for our first food opera, five for the second and third), and for most courses, the diner has a choice between two dishes.

In addition to the events indicating the beginning and ending of each course, the music responds to the elapsed time since the beginning of each course. In most cases, over the progression of each course, there is a gradual decrease in musical intensity, which may be quantified in density of musical events, volume, or number of concurrent layers. An additional behavior, limited in duration, may be associated with the beginning or ending of each course.

These parameters may seem few, but given that diners are coming and going throughout a span of roughly five hours, and that any of the twenty-six diners may be at any point in their meal at any time, the richness and complexity of the overall, emergent sonic environment is significant.

The field recordings and interviews also draw on video game techniques, as recordings of environmental ambiences and dialog are a key component of most video games. Our field recordings are chopped up and shuffled, providing infinite variation. Lines of dialog are also edited into individual files and balanced for volume before being deployed intermittently.

Texture and Musical Granularity

What we mean by musical texture is a convenient middle ground between an individual note and a fully composed musical composition. It is a musical behavior with a clear identity, characterized by instrumentation, melodic contour and duration, harmonic structure, voicing style, rhythmic density, and number of constituent layers. While textures may contain melodic elements, they are not primarily melodic in character. Our textures are designed to continue indefinitely without repetition, using algorithmic techniques to vary, shuffle, offset, juxtapose, and in some cases generate musical material.

Game music may be categorized according to the granularity of its musical components, from through-composed pieces of several minutes in duration to individual notes (or even, in considering real-time synthesis systems, subdivisions of notes). Granularity is linked to responsiveness, which is to say, the maximum amount of time that could elapse between the arrival of a game event and a musical system’s response. By orienting music around texture, we achieve a highly responsive musical fabric.

Texture in our usage is independent of pitch, so that the music may modulate to a new key area while preserving texture identity. Some textures incorporate longer phrases, which are categorized according to the harmonic areas with which they are compatible.
We emphasize texture, because it offers a useful way of defining a musical identity without large-scale temporal expectation; a key aspect of our use of texture is that it can continue indefinitely. This allows for sustained and concentrated evaluation, which in turn makes it well suited to pairing with other multimedia input, such as taste.

We find this approach preferable to playing a completed piece of music, which may contain a conflicting or distracting dramatic trajectory (manifested, for example, as the climax of a phrase or a juncture in a piece’s formal articulation), or which may end prematurely, or which may loop unvaryingly, resulting in fatigue or annoyance for listeners due to unmitigated repetition.
This approach is also preferable to simply sustaining or repeating a single tone, timbre, or sonority, since it provides rhythmic and harmonic context for individual tones and allows for the investigation of the modes of meaning with which music has long been associated, including harmony, rhythm, and melody—the syntax and grammar of music.

Coordination and Modularity

The music for the food opera is coordinated in harmony and rhythm and composed so that this coordination would be readily apparent. As most of our diners are assumed to be musical non-specialists, we decided to compose music that is diatonic and organized around a clear beat referential of 188 beats per minute.

All rhythmic activity is coordinated in reference to a common pulse that is present throughout the work. New events are, for the most part, quantized to start on a beat (although random delays of a few milliseconds may be introduced as a humanizing gesture). There is no concept of meter, only pulse, although events may be assigned to happen on multiples of beats, which creates a kind of local meter within a specific texture.

As the project has evolved over our three events so far, we determined a desire to blur the rhythm at certain points, to avoid an overly metronomic effect. For this reason we introduced the idea of rubato phrases that could be played intermittently, uncoordinated with the underlying pulse.

Over the course of the evening, the underlying harmony modulates very slowly in a random walk among five diatonic key areas excerpted from the circle of fifths: D, G, C, F, and B flat. This provides a sense of harmonic progression and combats key fatigue. A bass drone (generated in real-time from transposed recordings of a viola, a human voice, a synthesizer, and an electric generator used in a chicken slaughtering we observed) articulates this harmonic movement. The bass drone slowly (every forty to eighty seconds) chooses scale degrees from the current tonality (excluding the leading tone) according to a first order Markov chain of acceptable progressions; this allows each texture to be heard with any of six possible scale degrees in the bass, providing additional variation. Key modulations happen every three to eight bass notes (with a weighting towards the short end of that range). We don’t really use MIDI as an interface, but a lot of the system works similarly to the way MIDI does, e.g. handling notes with integer ID numbers.

Our musical textures are transposed according to one of two principles. For musical textures that are rendered using this MIDI-like system, transposition is a trivial affair, involving an integer semitone offset. For textures that involve recorded phrases, transposition is slightly more complicated: based on the current key, we choose from a pre-compiled list representing the subset of phrases compatible with the current key. In some cases, we perform real-time transposition of musical phrases. For rubato phrases that do not need to be rhythmically precise, we have found it musically acceptable to transpose as much as two semi-tones up or down to achieve compatibility with the current key. For phrases that do require rhythmic precision, we may transpose in simple multiples of the original speed (e.g., 0.5, 0.667, 1.5, 2.0) to achieve musical variety and to maximize the contexts in which a recorded phrase may be used (an important concept in game audio development).

As for the musical textures, each can be thought of as an independent algorithmic study unto itself. The only constraint is that it must in some way conform to the current key area and acknowledge the underlying pulse. Textures are composed to be somewhat sparse, with an awareness that they will be deployed alongside other textures, and are designed with registral variation in mind.
The system we used for the first three food operas was programmed in Max. The menu was preloaded into the system, and as diners placed their orders, we input the choices into a patch representing the layout of the restaurant. As dishes came and went, we advanced the musical behavior for each seat. An interstitial behavior, consisting of footsteps from one of the farms providing the ingredients for the meal, quantized to the pulse of the music, played between courses. Ambient sound from the farms played before the first course and after the last.


Soft Poached Egg (already consumed), Cucumber Foam, Lemon Balm Brioche, Prairie Fire Chili, Brown Butter Vinaigrette, and speaker in position in front of the plate. Photo by Jutta Friedrichs.

Scoring vs. Synaesthesia

A question that has come up repeatedly while working on this project is whether the goal is to attain a kind of synaesthesia. I have observed an increase in discussion around the notion of synaesthesia in recent years, and it is my guess that the reason has to do with the importance of mapping in the growing field of digital art. By now, this concept should be familiar to anyone with a passing interest in the genre; mapping, simply defined, is the binding of two sets of data. A classic video game example would be the linking of a vehicle’s velocity to its engine sound’s pitch. In an interactive installation, it might be the association of a viewer’s location in a video image with the volume of an accompanying sound.

Synaesthesia is a hard-wired mapping between the senses. I think of Messiaen hearing a certain sonority as blue-violet or mauve. There’s a notion that a sound has a certain smell, or a color has a certain sound, and it is not a choice or assignation, but rather an inalienable attribute of the thing.

What mapping has in common with synaesthesia is the notion of consistency; once established, there is a fixed method for converting one kind of information into another. The difference between mapping and synaesthesia is that, whereas synaesthesia is somehow hard-wired, a mapping is assigned (we might just as well say, “designed,” or “composed”). This creative aspect of mapping, of system design, is only slowly gaining recognition as a field of artistic activity, but it is not far removed from an aleatoric composition like Christian Wolff’s Burdocks, for example. Tatsuya Mizuguchi’s video game Rez is a famous example of an effort to suggest synaesthesia via a tightly coordinated merging of visuals, music, and (if you bought the Trance Vibrator peripheral) haptic information. On the other hand, there is a preponderance of software that makes it possible to completely ignore this creative exercise, resulting in some very blunt conversions; the famous Aphex Twin example works as a goofy in-joke, but countless others fall flat from what amounts to a lack of understanding of basic artistic materials.

However, neither of these perspectives is quite what I’m attempting to do with the food opera. Instead, I come at it from the perspective of scoring. In scoring a film or choreography or any other time-based medium, the task of the composer is not to simply convert the visual information into sound. Rather, the idea is to add an additional layer of meaning on top of the original source, to create an additional stream of information. (In fact, this is a common exercise for beginning film scoring students at Berklee College of Music, where I teach: to take a film clip without sound and use music to give it several different emotional spins, now comical, now tense, now poignant, for example.) The result is a kind of counterpoint, in which this new stream contextualizes and informs the original information. There’s a potential for ambiguity, which can provide richness and nuance to someone experiencing the work. To simply replicate it in sound would be redundant.

So it is with the food opera; I am presenting not so much the sound of Chicken Galantine au Foin with Lion’s Mane Mushrooms, White Sage Peach Confit, and Broccoli Leaf; but rather a sound of Chicken Galantine au Foin with Lion’s Mane Mushrooms, White Sage Peach Confit, and Broccoli Leaf. Of course, there is an intense effort to present a score that supports the meal in a meaningful way, but the associations are intuitive and arbitrary, and any number of alternate solutions to this compositional problem may be equally valid.

Future Work

We have a lot of ideas about how to take this idea forward; as I mentioned earlier, the expressive possibilities are vast.
On the technical side, there is enormous potential to increase the amount of input into the system generating the music; we plan to explore this potential in future food operas via a range of sensing mechanisms. Accelerometers could be attached to stemware or silverware, or cameras could be mounted above the tables, to name only the most obvious subsequent steps. In some situations, it might be desirable to have live musicians performing remotely, similar to what is currently done in some Broadway productions, incorporating real-time scoring and cueing systems.

There is also considerable potential to expand the concept in terms of visual art, incorporating projections and video displays, by coordinating table settings, clothing, and other elements of interior and set design.

Food Opera – Four Asparagus Compositions from Jutta Friedrichs on Vimeo.


Ben Houge

Ben Houge. Photo by Jutta Friedrichs

Ben Houge is an artist working at the nexus of video games, sound installation, digital art, generative video, music composition, and performance. Also an active composer of choral music, his Lao Zhang was premiered by Dale Warland at the American Composers Forum’s ChoralConnections conference in 2012, and other choral compositions have been commissioned by the Esoterics and the Minnesota Compline Choir. He holds degrees in music from St. Olaf College and the University of Washington and recently moved to Spain to help develop and teach in the new Music Technology Innovation master’s program at the Valencia campus of the Berklee College of Music.