Today is an important day – Cave will be finally presented to a public audience at SIGGRAPH!
Because of this, I would like to talk about sound specifically in the Cave project.
I lead Cave’s audio team with the help of Dennis and Tatiana who did a tremendous job preparing the sounds and programming the audio which was very challenging. This project required many hacks and tricks to bring to life. Currently, the tools for 3D sound post-production in VR are very underdeveloped. We had a very hard time finding a plugin for Unity which was both reliable and had all of the features necessary to create a convincing and natural sound layer. We ended up using the Steam Audio plugin as our main tool which was not ideal, but with some homemade upgrades, we managed to make it work.
Sound Design
There are several layers of sound used in the Cave. First, there is the dialogue track which was recorded in an acoustically treated studio. The recording took place after the motion capture was complete so that MaYaa (who was recording Ayara’s voice) could act in sync with the character’s movements. We used two microphones (one closer and one farther away) so that we could later choose which timbre would best fit the character in the scene.
The second layer consists of sound effects. Most were taken from sound libraries and processed. Dennis also did foley recording for Ayara’s clothes as the sounds had to be perfectly in sync with her body movements.
The third layer consists of ambiences. Ambiences were built from our own recordings of water and cave interiors, mixed with the soundscapes of mines and caves chosen from the libraries.
The fourth layer is beautiful music composed by Ryan Shore.
Workflow
We started by designing all of the audio layers in ProTools as it was the easiest way to process the raw sounds and combine various layers of audio.
We created a mono mix for each sound object and stereo mix for ambience layers. After that, we imported sound files into Unity and attached them to the objects in the game – this way Ayara’s voice could move with the character in the scene. The movements of sound sources were rendered by the Steam Audio Plugin which takes into account both the listener’s and audio source’s position to create an appropriate binaural stereo mix. The first challenge at this stage was that ProTools did not give us the option to listen to the mix in 3D with proper positioning of all of the sounds. As a result, this work was much more time-consuming. We went back and forth – creating the sounds in ProTools, rendering, importing to Unity, listening inside the experience, fixing the sounds in ProTools, rendering — again and again.
Plugins
Although the Steam Audio Plugin renders the position of sound in 3D space very naturally it does not support sound source directivity. Directivity is a measure of the directional characteristic of a sound source – Ask someone to rotate while he/she is talking to you. The voice will sound different when the person is facing the listener, and when is looking away. In order to simulate sound source directivity, we need to be able to change the timbre of sound depending on the rotation of the sound object.
Another feature which was not supported by the plugin was the change of reverberation and sound reflections in the cave when the sound object was moving. In extremely reverberant spaces like our cave, there are a ton of reflections. The number of sound reflections which reach the listener should change depending on the position of the character in the space, its rotation (if it’s facing the wall and close to it or not) and distance from the listener. If the number of reflections was not controlled, it was especially noticeable on Ayara’s voice — it did not sound right.
To incorporate the missing pieces, we sat down one day with Sebastian, our technical magical director (for him nothing is impossible), who wrote a couple of scripts in Unity which changed the frequency characteristics, amount of reflections and loudness of the sound depending on position and rotation of the source and listener. Of course, we created a very basic model for sound rendering, but even this helped a lot and made the voice sound much more real.
Playback system
Cave is a shared experience. The audience sees each other inside the scene and should hear each other during the experience as well. We could not use speakers because the audio is rendering in real time depending on the position of each person. With speakers, everyone would hear the same audio.
Using headphones is not ideal either – people are not able to hear each other during the experience because their ears are blocked.
That is why we are using new technology brought by Bose – Bose AR which is built into Lenovo headsets. Using tiny directional speakers, allows us to deliver high-quality sound without obstructing the ears so that people can interact with each other sonically during the experience.
To elevate the experience we decided to use also a subwoofer. Enhancing the low frequencies enabled us to create the impression of the size and weight of a mammoth. The result is truly thrilling…
We learned a lot from this project and the best thing is that there is so much more to improve in the future!