January 2006 - April 2007
Course: Graduation project, extended
Original name: Immersive and Spatial Voice Audio in Networked Virtual Environments
Software: C/C++
Description
Visions of the future such as The Matrix, The Street (from the book Snow Crash),
or Cyberspace (from the book Neuromancer) all support audio and visual communication
in a way which works naturally with reality. In the virtual world, when someone
walks around a corner, you no longer see him nor hear him. The promise of virtual
reality has been within the public consciousness for decades, however, the technology
for achieving an immersive experience has only been available fairly recently.
In the current work on multiplayer virtual reality, the research has focused largely
on the visual aspect combined with text. Audio is often neglected and when present
typically ignores the structure of the virtual world. In this project we have created
a new system which integrates the audio, visual and 3D structure of the virtual
world. Specifically, our novel contribution is the creation of a system which models
the effect of the 3D world structure upon the audio and visual aspects in a natural
and intuitive manner: players in the massive multiplayer world can now talk with
each other as in real life.
Client and server architecture
Besides providing clients in the virtual world with a realistic audio experience,
we wanted to achieve the following:
- Allow a large number or clients (>100) to connect
- Require only moderate bandwidth (~128kbps)
- Work from behind routers and firewalls
- Maximize portability (any virtual world can be used)
The central server architecture - where all clients connect with a single server
- was our optimal choice, fullfilling all design goals. More advanced architectures
would definitely result in more clients being able to connect, but would have a
downside that the system will be much more complex and less easy to set up and get
running. To illustrate our audio system, we integrated it in the Quake 3 game. This
game has the advantage of being open source and well-known for its low bandwidth
usage. Note that any other game could have been used instead, provided access to
the source code to enable it to use our audio framework.
Sound attenuation
We have modeled how the distance and angle between sounds and listeners affects
the audio perception and additionally devised a novel algorithm to handle structural
audio attenuation. The structural audio problem occurs when the 3D structure interacts
with the audio signal. Examples include simply going around a corner or walking
into a room and closing the door. In both cases, the 3D structure affects the audio
– typically lowering the amplitude but potentially also causing audio reflections
and refractions. For structural audio, we can not simply cut off the audio when
a wall is in between a sound and a listener, rather we must have a natural drop
off due to the interference with the 3D world.
Our novel structural audio algorithm employs Cauchy’s probability distribution,
see Figure 1, to weight a grid that is placed with its center at the listener’s
location, pointing at the origin of the sound. The weights near the center of the
grid have higher values than those along the edges.
Figure 1. Cauchy probability distribution
The ‘audibility’ of each point on the grid its determined by tracing the visibility between itself and the sound, see Figure 2. The attenuation factor is formed by adding only the weights of the grid points that are ‘audible’. This technique results in smooth sound transitions when moving around objects and corners while talking to other players.
Figure 2. Tracing visibility using the Cauchy-based grid: with direct line-of-sight
(left) and a
partially obstructed line-of-sight (right)
Our audio algorithm utilizes the Cauchy distribution because it (a) has been shown
in other areas to be more realistic to real world distributions and (b) in the future
will allow us to adaptively adjust its parameters, e.g. simulate different environments
or modifying sound perception through the use of in-game items.
With our audio framework, players can have conversations with many people at the
same time, because the audio correctly appears to originate from the visual location
of the players that are talking. Moreover, players are able to localize any sound
source and direct visual attention to where the sound is coming from. Note that
our method does not take reflections, refractions and interference with other sound
waves into account.
Publications
For more information and experimental results, take a look at my Master's Thesis
and the VRIC2007 paper in the publications section.