Listening in 3d

Listening in 3d

One of the remarkable abilities of our auditory system is that it can pinpoint the location of sound sources.

By: Matthias Scholz
User Interface Designer
PhD Applied Acoustics
Brüel & Kjær

This is vital in many situations in life, such as safe navigation in traffic. But the spatial properties of sound are just as important to achieve a realistic acoustic environment in gaming and home cinema set-ups. So how does it work and what does it take to recreate an authentic experience?

Listening in 3dInteraural time difference
Fig. 1a: With sound coming from the front, the interaural time difference is zero (left). Coming from the side, with a head size of about 20 cm and a sound speed of 340 m/s, the maximum time difference is 0.58 ms (right)
Listening in 3dInteraural phase difference
Fig. 1b: While usually the ears will sense a phase difference (left), depending on frequency and angle of incident they may detect a false phase match (right)
How do we localize sound?
The first clue our hearing uses is interaural time difference (fig. 1a). Sound from a source directly in front of or behind us will arrive simultaneously at both ears. If the source moves to the left or right, our auditory system recognizes that the sound from the same source arrived at both ears, but with a certain delay, or seen the other way around, the two ears pick up different phases of the same signal.

We decipher phase differences best at low frequencies. At higher frequencies, the wavelengths can be so short compared to the size of the head, that the pattern repeats itself and both ears may coincidentally pick up the same phase (fig. 1b).

Fortunately, the auditory system has another clue to work with: the acoustic shadow created by our head when sound arrives from the side, a phenomenon that increases with frequency. At very low frequencies, the size of our head is small compared to the wavelength of sound in air. Consequently, the sound pressure is essentially the same at the left and right ear, no matter from which direction the sound arrives. However, with increasing frequency the wavelength decreases and the size of our head is no longer negligible. It becomes an obstacle that shields and reflects sound, so that in comparison to the ear which faces towards the source, higher frequency content will be attenuated when it arrives at the ear on the opposite side of the head.

The shape of our pinnae also provides a wealth of spectral (frequency-dependent) clues. Like the acoustic shadow of the head, the pinna functions like a shield attenuating the higher frequencies of sound that does not enter straight from the front. You can experience this by turning away from and again towards a source. While doing so you should sense a slight change at high frequencies, something you would normally not pay attention to.

In addition, dependent on the frequency and direction of incident, the pinna’s shape affects sound as it is reflected into the ear canals, enhancing some frequencies and attenuating others.

Binaural hearing and reproduction of sound
Generally, for a correct spatial acoustic experience we need both ears (binaural), since the comparison between left and right ear gives the strongest clues about source locations. It may not come as a surprise that we have the most difficulty in localizing sources on the median plane, where there is almost no interaural difference.

However, a lot of our directional sense is built on experience, which is linked to our own physiology – the size and shape of our head, pinnae and ear canals. Over time, our auditory system builds up a pool of references, such as noticing that sound from behind sounds slightly duller.

Therefore, to create a convincing spatial experience, where it is possible to sense the exact location of sound sources, the reproduction of sound must provide all the information our auditory system is used to. There are basically two ways to do so.

1: Binaural recording
A binaural recording can be made with a pair of microphones carried close to the ears or – as it is usually done – using an artificial head with the microphones placed at the entrance of the ear canals. Such a recording is intended for direct playback over high-quality headphones; that is, the sound is reproduced as close as possible at the same point where it was captured. Playing it back over loudspeakers without further signal processing, such as cross-talk cancellation, would not work, since the signal would be sent through the room and around the listener’s head, creating a completely different experience.

Listening in 3dFig. 2: Exact reproduction of three dimensional soundscapes using loudspeakers requires highly sound absorbing rooms to avoid reflections
2: Microphone array
In this approach, one uses an array of microphones arranged in a closely spaced, three-dimensional pattern. This will record sound in a point, but with spatial information about the direction of incident. With the help of sophisticated algorithms, it is then possible to reproduce a similar sound field using an arrangement of speakers around the listener. The result is best if the listening room is highly sound absorbing, so that sound, which has passed the listener, is not reflected. Otherwise, the characteristics of the room would be added (fig. 2). This technique requires the listener to remain in a fixed position, or at least within a limited area. However, the experience would feel authentic; turning towards the different loudspeakers would make you feel like you were facing the actual sound sources.

Listening in 3dFig. 3: Measurement of the HRTF for a source at a specific angle
Head-related transfer functions
We can combine the two techniques, and play back sound through headphones, even though it was recorded with a microphone array. This too requires some processing to convert the array recording to a binaural signal. To do so, we need to take the presence of the listener’s head into account and how it influences sound as it impinges from the various directions.

This relation is described by the head-related transfer function (HRTF). A single HRTF describes how a sound created at a specific point will be perceived at the right or left ear. You could say that it is the acoustical fingerprint of your head and torso.

To measure an HRTF, one places a loudspeaker in a source location and a microphone at the ear (fig. 3). While this is a manageable task for a single or few source locations, covering all possible angles will require a vast set of HRTFs and one set for each ear (fig. 4), but the result is rewarding.

Compared to listening to a straight binaural recording, the advantage of using a signal, which was recorded with an array and processed through an HRTF, is that the playback set-up can utilize a sensor to pick up the orientation of your head and correct the processing accordingly. As an example, as you turn your head to the left, a sound source that was originally in front of you would then appear to your right and vice versa. This then gives a similar sense of 'being present' as in the loudspeaker set-up, but without the limitations of having to be in a special room, since the sound goes straight from the headset into your ears.

Listening in 3d

Fig. 4: To process sound from any direction, the HRTF measurement must be repeated for many source points around the head

More articles about ‘The physics of sound and vibration’:

Wavelength, frequency and speed of sound