Stereo: Localization, Imaging and Live Sound

by Phil Graham
in Tech Feature

Neumann’s KU 100 microphone simulates the way that humans hear by placing a small condenser mic capsule microphone in each “ear” of the head model. So called “dummy head” microphones are useful in making acoustical measurements or binaural simulation recordings.Virtually everything about our pro audio world acknowledges the existence of multiple channels of audio and, most commonly, stereo channels (i.e., left and right). Nearly every piece of audio gear provides both stereo inputs and outputs. At concerts, there are almost always left and right speaker arrays. Essentially almost all music playback material is in stereo. This month, let’s take a look how we localize sound, and how that relates to the traditional use of the stereo configuration. We will consider stereo’s advantages, limitations, and place in the pro sound environment. We’ll also discuss how sound technicians can shape their mix to get the best overall results for the majority of the listening audience.

At this point, some readers will be thinking “Great, another guy who is going to tell us to hang a big monophonic PA right over the middle of the stage...” Let me step in front of this thought and say that I like stereo, and generally prefer to use it over mono or left/center/right configurations. Along with this preference, though, is an understanding of when stereo works, and what it can provide for the mixer and listener. With that, it is time to start dissecting where and how we hear.

» Stereo Mix Structure — A History

Dual channel stereo playback became available to consumers in the mid-1950’s because of advances in phonographic records. As an owner of an early stereo demo record from RCA, I’m struck by the incredibly strange and dramatic panning that some of the tracks exhibit. Examples include the drums panned to the left channel, and all the vocals to the right.

As time progressed, the panning of sounds on pop music settled into a general motif that continues today. This motif was originally technically motivated, as one could not have strongly panned low frequencies without risk of the phonograph stylus skipping out of the groove. Indeed the vinyl master cutting lathes had circuitry that would combine all the low frequency energy (<200 Hz) (bass, kick drums, etc.) and place it in the center of the record groove (i.e., center-panned). Thus, low frequency energy center-panned mono was firmly established in pop mixing and is commonly used today, despite the fact that hard-panned LF elements in a mix are unlikely to make a needle jump out of a groove on a CD or MP3 file.

The modern incarnation of studio record mixing doesn’t have the limitations of vinyl grooves, but much of vinyl’s legacy remains in mixing. Low frequency and percussive instruments are typically panned to the center, while other instruments are panned to the sides. Panning on studio pop records tends to be dramatic, with hard panning of guitars left and right incredibly common and the exaggerated 180-degree L-R panning on tom-tom rolls is standard practice, yet bears no relation to a “real” stereo image.

» Human Hearing — Amazing!

Our ability to hear is an incredible sense. In terms of the range of volumes and frequencies, it is a sense of extreme capability. One only need listen online to the tinny, distorted audio from an audience member’s smart phone at a concert to appreciate the capability of human hearing relative to modern technology.

Equally as impressive as our ability to pick up sounds is the brain’s ability to process, discriminate, and present those incoming sounds to us. Our brain is very good at picking up speech in noisy environments, filtering out reflections, and figuring out where sounds are arriving from.

Part of our ability to parse through the world of sound so well comes from the number of things our brains have learned to ignore. This is demonstrated by the technology we use every day. We now have more than two decades of experience with selectively processing audio in a manner to reduce the size of the storage file, and at least part of that storage compression comes from understanding the limitations of human hearing.

For instance, if two sounds of similar frequency are played at the same time, our brain excludes the information about the quieter signal. This phenomena is known as noise masking. In a related way, if two sounds of similar level are played close together in time, the brain will “fuse” the two sounds into one, but the sound that arrived first will define our perception of the arrival direction. This second effect is known as the precedence effect.

» Insights from Lossy Compression

Audio compression schemes (e.g., mp3), use behaviors like noise masking and the precedence effect to intelligently allocate the available audio bandwidth to the elements of the music that our brains recognize most strongly. One of the things that these compression schemes can do is change the representation of stereo energy from left and right to mid-side (see sidebar, this page). While mathematically identical to traditional left and right stereo, mid-side is useful to ascertain what sounds are dominant at the center of the audio spatial image, versus at the periphery, further shrinking file size. Especially with mp3s, a reliable way to hear the lossy compression is to listen to only one channel of playback. The effects in the side channel are much easier to identify with a single playback channel, as the strong center image is no longer masking the listener’s perception.

Mid-side processing also occurs in the studio, mastering, and broadcasting. FM radio, for instance, is broadcast in mid-side. The mid channel is broadcast at high power, and the side channel at a lower power. If the signal becomes weak, the mid channel, which typically contains most elements of the mix, is still available to the listener because of the higher broadcast power.

» Human Sound Localization

So far we’ve touched on common traits of stereo mixing and how our brain chooses to categorize incoming sound information. Now, as we dive a little deeper into the details of how we determine sound arrivals, a complex array of inter-related processes emerge. Those processes warrant an article to themselves, but for now we’ll paint them with broad strokes.

Since we have two ears and a head, the brain can use the effects of the ear spacing, ear shape, and head shape to analyze where sounds arrive from. Sounds arriving from one specific side of the head will experience different volumes, arrival times, and frequency response at each eardrum. The head’s “shadowing” of the sound is one of the brain’s most powerful tools to determine sound arrivals. The shape of the ear also exhibits similar influence on arriving sounds. Research further indicates the brain makes subtle movements of the head to improve the nature of localization by observing the head’s influence on the incoming sound from slightly different positions.

Sound localization is a fusion of time and frequency, level and shadowing. All multi-channel audio schemes — whether surround sound or stereo — seek to provide the brain enough directional cues to allow clear localization and create a convincing soundfield for the listener. Sometimes that soundfield is supposed to represent the feel of a real place (e.g., soundtracks), and other times (e.g., pop records) it is to give “space” and ambience to a collection of instruments and voices.

» Localization and Stereo

If the place of stereo in pro sound mixing is to provide “space” in the mix, then mono is not a convincing replacement for stereo behind the mixing desk. Mono requires much more skill in separating instruments using frequency and level because it removes panning from the mixing equation. Especially in the HOW market, where the mixer behind the board is often a volunteer doing the best they can, removing the ability to pan to help out “room” in the mix seems a tough compromise.

So then, if stereo is a useful mixing tool, why is it so often ineffective in the live sound environment? Much of this has to do with the limitations of how we localize sound. Briefly, if a sound is louder, or if it arrive earlier in time, from a particular source (i.e., loudspeaker), the second source is either fused with the original sound, or ignored by the brain. Unfortunately in live sound, both late arrivals and quiet arrivals are common occurrences. Geometry and sight lines often dictate that our loudspeakers are too far apart to allow similar arrival times for every audience member. Even if we can provide similar arrival times (<20 ms difference), the speakers are often so far apart from each other that the volume levels of them are very different, due to absorption. The net result in these circumstances is that sound image collapses completely to the nearest, or loudest, speaker for a particular area of the audience.

Sometimes these compromises are unavoidable, and in such circumstances, I advocate for treating the left and right sides of the PA as two separate mono sound systems, rather than as mediocre stereo for 25% of the audience. Indeed, if the audience is very wide, one might have to create two distinct mono zones to provide adequate horizontal coverage for all the patrons.

The goal is then to minimize the coverage overlap in the middle of the mix. When mixing in this environment, I advocate mixing in mono, but then using stereo reverb effects to add overall ambience to the mix. Even if the audience isn’t perfectly interacting with both the left and right arrays, the additional ambience of the reverb is almost always effective. The one other mix alternative that I feel has merit in this dual-mono context is using delay panning (See sidebar, this page).

In the world where we can decide on the placements of our speakers with great flexibility for the audience, I wholeheartedly encourage the use of stereo speakers and mixing in stereo. The use of stereo speakers necessitates segmenting the audience into comparatively narrow segments to facilitate similar arrival times and volume levels at every seat. The most common way this is achieved is via alternating left/right/left/right speaker arrays. Half the audience will have their stereo image of the stage reversed, but I’ve never had a patron notice this. The additional flexibility in terms of space in the mix from alternating left/right is a boon to the mixer in all areas of the audience.

» Conclusion

In earlier, less visually motivated times, rooms for concerts were made narrower and deeper. Narrower and deeper rooms undoubtedly are superior for supporting a convincing stereo image over most of the audience, as the differences in arrivals between the left and right speaker arrays are less than in a wide and shallow room. Since these rooms are less common now, I feel that making space in a mono mix is an increasingly important skill.

As a FOH or monitor mixer, I encourage the motivated reader to practice mixing in mono. If you become more adept separating instruments using level, equalization, compression, or different effects in the mono context, then your mix will have that much more “room” when the PA for proper stereo panning arrives. Alternatively, for the mixer skilled at mixing in mono, consider adding a bit of panning to your mix and see if doesn’t open up your mixes even more, at least for the segment of the audience that can experience proper stereo.

Mid-Side Explained

Mid-side is a technique for re-representing the information of a stereo audio signal. The “mid” channel is derived by:

Mid = A(Left + Right)

Where “A” is a constant that compensates for the change in volume of the summed left and right signals.

In similar fashion, the side channel is derived by:

Side = (Left - Right)

A little bit of algebra shows that the left and right channels can be extracted back from the mid and side by either adding or subtracting the two channels from each other.

Mid-side is popular in the recording studio context, and is used in some lossy compression schemes. It is also used in mastering, because one can use compression with different attack and release on the mid channel versus the side channel. Because modern pop music typically has more percussive material in the mid channel, tailoring the attack and release for the mid channel separate from the side is often effective.

Delay Panning

When stuck in an environment where one cannot pan the level of sources, because the stereo image would collapse, another technique that can be effective is delay panning. In delay panning, the source is delayed, typically between 8 and 20 milliseconds (ms), in the channel where the panning is desired.

So, rather than turning down the guitar in either the left or right channel, we instead draw their localization image left or right using a little bit of delay in the appropriate channel. For people in the center of the audience, who can clearly hear both speaker arrays, the delay will manifest for them as changing the localization direction of the source.

For people outside the coverage area of both loudspeakers, they will still hear all the sources at the same level, but certain sources will have a small amount of delay on them in the mix. Generally this delay is not readily audible, and certainly less so than a source missing from the mix entirely, due to level panning.

Implementing delay panning on a modern digital console is usually an achievable process, but this author is not aware of any digital console manufacturers who have implemented delay panning as a native mode inside their digital offerings.