If you have ever heard about 3D audio, for sure you also heard the term HRTF.
HRTF stands for Head Related Transfer Function – you can think of it as a fingerprint of our spatial hearing. It’s unique for everyone because each of us has a unique shape of the ears. To explain why this is so important for 3D audio technologies, let’s start from the beginning.
How is it possible that we can detect the position of a sound from any direction having only two ears?
Our brain determines the position of the sound source based on the difference between signals in the left and right ear. Imagine holding a radio on the left side of your head. The sound reaching your left ear will be louder and will arrive there faster than the right ear. Based on the difference of time arrival and intensity, our auditory system detects the position of sound around our head.
But even still, this difference does not tell much about the elevation of the sound. And here is the moment when our pinna reveals its importance to our hearing. The shape of our pinna – unique as the fingerprint – acts like a filter. Whenever sound reaches the ear, it bounces off tiny ridges of the pinna. The reflections that occur amongst these ridges are different depending on the direction of the sound. Thus, the way these reflections interact with the shape of the pinnae, as well as the direction from which the sound signal arrives, both change the spectral content of the signal, which essentially means that some of the frequencies become louder than others. Our auditory system analyzes this change in intensity of the frequencies and, based on that, determines the elevation of the source.
These three cues: difference in time, intensity and spectral content constitute the HRTF.
If the HRTF is known we can convolve it with any signal and change the sound source position around the head. This is how binaural technology works.
The problem is how to obtain the HRTF. It is something we can measure, but this takes a lot of time, as we have to measure each possible position of the sound source. To solve this problem, most spatial audio tools use generic HRTF based on the model of a human head. But the problem is that these generic cues are different than measurements taken on person individually and the rendering is not accurate.
How to obtain individualized HRTFs efficiently, then? This problem has not yet been solved. Here are several approaches being researched right now:
- Statistical analysis of HRTF database which allows matching basic measurement of the head with appropriate HRTFs
- Fitting head model – deformation of the head model based on the measurements of a user using computer vision techniques
- User profiling through in-game behavior – based on the performance of the user the HRTF can be automatically assigned
- Machine learning – estimating the whole sphere of measurements from only a few