Virtual Acoustics

Selection of Head-Related Transfer Functions





1. The Problem of Idiosyncratic Cues

Head-related transfer functions (HRTFs) describe the changes that are imposed on a sound on the path from the source to the ear canal of the listener. The presence of head and body alter the sound as it reaches the ear. The features captured in HRTFs include the time it takes the sound to travel around the head as well as spectral changes introduced by the pinna, by the head-shadow, and by shoulder reflections. In short, at least theoretically, HRTFs capture all changes of the sound on it's way from the free-field to the ears. Since HRTFs include all the directional information at the level of the ear canal they can be used to spatialize a monaural sound recording. To reproduce a monaural sound from different directions the sound signal is filtered with the pair of HRTFs from that direction and played via headphones. This technique is called virtual acoustics.

Inside the ear-canal measurement of HRTFs HRTFs are highly idiosyncratic because the shape of our ears and the head size are different from person to person. Like a finger print HRTFs show an individual pattern of peaks and notches, and head size has a large impact on the time delay between the ears. Individual differences can have a pronounced impact on applications that use HRTFs to spatialize audio. If HRTFs in the application are not individually tailored to the listener spatial directions will often be perceived as shifted by several degrees and sounds will be frequently localized inside the head. This happens because our brain has adapted to the information in our own HRTFs. If an application uses average HRTFs or HRTFs from another person our brain interprets those HRTFs on the basis of the cues in our own HRTFs and differences between those two HRTF-sets lead to differences in perception. This can be problematic for some applications since the magnitude of those differences is not known upfront as it depends on the particular user of the application.

Errors in spatial reproduction can be drastically reduced by using individual HRTFs. Individual HRTFs are usually measured with tiny probe microphones which are placed inside the ear canal (see picture). However, for practical applications this is often not feasible since the measurement takes some time and a specialized measurement setup is needed. This requires the user to go to a place where such a measurement can be done which prevents the wide-spread application of individual HRTFs.

An alternate way is to individually adapt non-individual HRTFs used in the application. The goal is to minimize the perceptual impact of using non-individual HRTFs so that the differences to individual HRTFs become small. One first attempt can be to measure the head size difference between the listener and the person from which the HRTFs were measured and to scale the HRTFs accordingly. This will provide a coarse individual fit of interaural time differences and also shift the center frequency of some of the more prominent peaks and notches in the frequency response. The perceptual impact of such a scaling is that horizontal directions will be more correctly mapped. Another approach to adapt non-individual HRTFs to a particular listener will be presented in the following section.


2. How to Optimize Virtual Acoustics — A Selection Method for HRTFs

Although differences in HRTFs between listeners can be substantial it is conceivable that a catalogue of HRTFs contains an HRTF that shows similar features to the individual HRTFs of a given listener. With a suitable selection method the listener could be guided to find an HRTF from the catalogue that shows the smallest perceived reproduction errors simply by listening to sounds processed with the HRTFs of the catalogue. The problem is that perceptual differences are hard to quantify by untrained listeners and that adaptation to the HRTFs reduces differences quickly. Another restriction is that the ideal selection method is quick and easy to use and requires no specific hardware, for example objects placed at specified positions for direction comparisons.

In my thesis I developed such a selection method and verified its effect on the selected HRTF. The aim of the selection was to optimize several criteria: horizontal and vertical localization, externalization, distance, and the number of front-back confusions. It turns out that, if the number of HRTFs in the catalog was large, even experienced subjects were not able to follow multiple questions that covered those criteria. Thus, a two-step selection procedure was developed:

Step I: Pre-select a small number of HRTFs from the catalog using a simple question. I had good experience with selecting sounds that were externalized. Of course, this question is a bit difficult for unexperienced subjects. Hence, the question can be modified to select sounds (and thus HRTFs) that are perceived as most spacious.
Step II: Select one HRTF from this set of pre-selected HRTFs using multiple questions. The second question can be allowed to be more complicated since it is applied only to a few sounds. The questions tried to minimize horizontal and vertical localization errors and targeted at perceiving a constant distance for different horizontal directions.

The Matlab-code below provides an example. As much as the proper questions are important for the results as much is the selection procedure. Bursts of noise were filtered with HRTFs of the catalogue and played via headphones. The selection worked best if subjects were allowed to directly compare HRTFs back-to-back. This coult be done in an A-B comparison, but the selection was more efficient if subjects were totally free to pick and play any HRTF they liked. This way subjects coult quickly sort out HRTFs that were not suitable - much faster than any algorithm coult do it.

The selection method was tested in multiple experiments in which subjects not only selected HRTFs but also localized sounds processed with all pre-selected HRTFs. The localization test presents an objective test of performance with those selected HRTFs. It was found that the selection method finds HRTFs that provide maximal externalization of sounds and minimize localization error and the number of front-back confusions. A more in-depth analysis and description of the method can be found in the publications cited below.


3. Publications and Download

  • The selection procedure and the verification results were presented at the Int. Conf. on Auditory Display in Boston, 2003, and published in:
    B.U. Seeber and H. Fastl: Subjective selection of non-individual head-related transfer functions. In E. Brazil and B. Shinn-Cunningham, editors, Proc. 9th Int. Conf. on Aud. Display. Boston University Publications Prod. Dept., Boston, USA. pages 259-262, 2003.

  • My thesis covered the measurement of HRTFs, the selection method, the verification of the selection method and the localization of sounds in great depth.

  • An experimental script for Matlab that implements the selection method and shows all questions can be found here. The method is fully implemented and only the way of accessing and using an HRTF-catalogue has to be modified to use the script.