Listening In 3D - Sound Source Localization

How Do We Localize Sound?

The first clue our hearing uses is interaural time difference (fig. 1a). Sound from a source directly in front of or behind us will arrive simultaneously at both ears. If the source moves to the left or right, our auditory system recognizes that the sound from the same source arrived at both ears, but with a certain delay, or seen the other way around, the two ears pick up different phases of the same signal.

Interaural time difference (Fig. 1a): With sound coming from the front, the interaural time difference is zero (left). Coming from the side, with a head size of about 20 cm and a sound speed of 340 m/s, the maximum time difference is 0.58 ms (right)

Interaural phase difference (Fig. 1b): While usually the ears will sense a phase difference (left), depending on frequency and angle of incident they may detect a false phase match (right)

We decipher phase differences best at low frequencies. At higher frequencies, the wavelengths can be so short compared to the size of the head, that the pattern repeats itself and both ears may coincidentally pick up the same phase (fig. 1b).

Fortunately, the auditory system has another clue to work with: The acoustic shadow created by our head when sound arrives from the side, a phenomenon that increases with frequency. At very low frequencies, the size of our head is small compared to the wavelength of sound in air. Consequently, the sound pressure is essentially the same at the left and right ear, no matter from which direction the sound arrives.

However, with increasing frequency the wavelength decreases and the size of our head is no longer negligible. It becomes an obstacle that shields and reflects sound, so that in comparison to the ear which faces towards the source, higher frequency content will be attenuated when it arrives at the ear on the opposite side of the head.

The shape of our pinnae also provides a wealth of spectral (frequency-dependent) clues.

Like the acoustic shadow of the head, the pinna functions like a shield attenuating the higher frequencies of sound that does not enter straight from the front. You can experience this by turning away from and again towards a source. While doing so you should sense a slight change at high frequencies, something you would normally not pay attention to.

LEARN MORE
ANATOMY OF THE EAR

In addition, dependent on the frequency and direction of incident, the pinna’s shape affects sound as it is reflected into the ear canals, enhancing some frequencies and attenuating others.

Binaural Hearing

Generally, for a correct spatial acoustic experience we need both ears (binaural), since the comparison between left and right ear gives the strongest clues about source locations. It may not come as a surprise that we have the most difficulty in localizing sources on the median plane, where there is almost no interaural difference.

However, a lot of our directional sense is built on experience, which is linked to our own physiology – the size and shape of our head, pinnae and ear canals.

Over time, our auditory system builds up a pool of references, such as noticing that sound from behind sounds slightly duller. Therefore, to create a convincing spatial experience, where it is possible to sense the exact location of sound sources, the reproduction of sound must provide all the information our auditory system is used to.

There are basically two ways to do so.

Binaural Recording For Direct Playback

A binaural recording can be made with a pair of microphones carried close to the ears or – as it is usually done – using an artificial head with the microphones placed at the entrance of the ear canals.

LEARN MORE
BINAURAL RECORDINGSuch a recording is intended for direct playback over high-quality headphones; that is, the sound is reproduced as close as possible at the same point where it was captured. Playing it back over loudspeakers without further signal processing, such as cross-talk cancellation, would not work, since the signal would be sent through the room and around the listener’s head, creating a completely different experience.

3D Soundscape Fig. 2: Exact reproduction of three dimensional soundscapes using loudspeakers requires highly sound absorbing rooms to avoid reflections

Reproducing The Sound Field

LEARN MORE
MICROPHONE ARRAY

In this approach, one uses an array of microphones arranged in a closely spaced, three-dimensional pattern. This will record sound in a point, but with spatial information about the direction of incident. With the help of sophisticated algorithms, it is then possible to reproduce a similar sound field using an arrangement of speakers around the listener.

The result is best if the listening room is highly sound absorbing, so that sound, which has passed the listener, is not reflected. Otherwise, the characteristics of the room would be added (fig. 2). This technique requires the listener to remain in a fixed position, or at least within a limited area. However, the experience would feel authentic; turning towards the different loudspeakers would make you feel like you were facing the actual sound sources.

HRTF Measurement Fig. 3: Measurement of the HRTF for a source at a specific angle

Head-related transfer functions

We can combine the two techniques, and play back sound through headphones, even though it was recorded with a microphone array. This too requires some processing to convert the array recording to a binaural signal. To do so, we need to take the presence of the listener’s head into account and how it influences sound as it impinges from the various directions.

This relation is described by the head-related transfer function (HRTF). A single HRTF describes how a sound created at a specific point will be perceived at the right or left ear. You could say that it is the acoustical fingerprint of your head and torso.

To measure an HRTF, one places a loudspeaker in a source location and a microphone at the ear (fig. 3). While this is a manageable task for a single or few source locations, covering all possible angles will require a vast set of HRTFs and one set for each ear (fig. 4), but the result is rewarding.Compared to listening to a straight binaural recording, the advantage of using a signal, which was recorded with an array and processed through an HRTF, is that the playback set-up can utilize a sensor to pick up the orientation of your head and correct the processing accordingly.

As an example, as you turn your head to the left, a sound source that was originally in front of you would then appear to your right and vice versa. This then gives a similar sense of 'being present' as in the loudspeaker set-up, but without the limitations of having to be in a special room, since the sound goes straight from the headset into your ears.

Listening in 3D Fig. 3 - To process sound from any direction, the HRTF measurement must be repeated for many source points around the head

By: Matthias Scholz
User Interface Designer
Ph.D. Applied Acoustics
Brüel & Kjær

RELATED ARTICLES