by Łukasz JanuszkiewiczJun 30, 2022

Automotive IoT, XR, Robotics, AI & ML Manufacturing R&D Software Development

How Spatial Audio Meets Customer Needs and Boosts Customer Satisfaction in High-Tech

12 min read

When it comes to sound reproduction, gone are the days when standard stereo was sufficient for every customer experience. In 2022, more and more businesses—from audio content producers to hardware manufacturers—are striving to provide upgraded listening solutions.

Spatial audio, a technology developed as a result of these endeavors, is apparently on the way to a glorious future. Recently, it experienced a 19% increase in customer engagement and a 90% spike in customer conversions.

In other words, hi-tech businesses that can use spatial audio, but don’t, are missing out on growth opportunities. Not only does this suite of technologies satisfy a consumer’s desire to recreate the nature of real-world listening. Spatial audio can literally spark awe, boosting customer satisfaction and loyalty.

In this article, we’ll explore the ins and outs of spatial audio, the most popular use cases for it, and the best ways of offering immersive audio to hi-tech customers.

What is Spatial Audio?

Spatial audio (also called 3D audio or immersive audio) is a commonly used term to describe audio systems that consist of equipment, protocols, and algorithms. These systems capture, process, and play sound in such a way that listeners perceive it like it’s coming from all around them.

Spatial audio systems aim to deliver a fully immersive experience; therefore, they don’t just focus on the proper reproduction of sound intensity and transparency. The experience entails listening to the sound along with some extra cues that allow listeners to perceive other sound properties, such as:

a direction of sounds emitted by different objects (sound objects),
a trajectory of moving sound objects,
evolution of the sound intensity because of changing distance
and many other sound effects that may be associated with the acoustic properties of the reproduced audio scene.

Moreover, 3D audio is not only about listening to the content but also about interacting with it. Depending on the application, 3D audio systems may introduce different types of interaction capabilities.

In 3DoF (Three Degrees of Freedom) systems, users can interact with the audio content by using axis rotations (yaw, pitch, roll) and rotating the whole audio scene around a specified axis. 3DoF audio can be useful in 360 Video productions, where sound scene rotation is synchronized with the view direction.

In 6DoF (Six Degrees of Freedom) systems, users have additional three degrees of freedom related to translation movements (left-right, up-down, front-back). This kind of interaction is already well-known and is experienced in video games, where users can move around the sound objects freely and immerse themselves within the scene. Another type of interaction entails the possibility of controlling specific sound objects, for example, by moving them or adjusting their sound level.

How in Demand is Spatial Audio?

The need for spatial audio is growing on a global scale as more and more businesses utilize it in their technology solutions. According to Transparency Market Research, while the global 3D audio market was valued at $4.6 billion in 2020, it is expected to reach $22 billion by the end of 2031.

The gaming industry is providing the biggest boom for 3D audio systems, as they are integrated into a wide range of games and other interactive software. Also, the rapid development of the metaverse, virtual reality (VR), and augmented reality (AR) has prompted high 3D audio market revenues over the last few years.

Based on Fortune Business Insights industry overview, Asia Pacific, North America, Middle East, Africa, Latin America, and Europe are currently the 3D audio market leaders. Asia Pacific will likely experience the highest CAGR by 2031, while China and India are expected to drive the market.

At the same time, not so many leading companies stand behind 3D audio development and implementation. Below are the most prominent players in the 3D audio market. They hold approximately 62% to 66% of the market due to their strong product portfolio and substantial user base. These companies have also adopted strategies such as product launches, partnerships, collaborations, mergers and acquisitions, and joint ventures to strengthen their foothold in the global 3D audio market.

Spatial Audio Use Cases

Let’s dive into the current and potential application of spatial audio technology.

1. VR/AR and Metaverse Audio

Recently, more and more AR/VR technology developments have been focused on Metaverse-oriented applications. In a metaverse space, users can navigate simultaneously within a shared AR/VR environment and interact with VR objects and other VR users in real-time. Spatial audio technology is essential for delivering a fully immersive experience in such virtual spaces.

2. Virtual Conference Room

Spatial audio is a key component for building a virtual conference room to conduct meetings in a virtual space. In such an application, all the sound objects, subjects, and acoustic attributes present in the real space need to be moved to the virtual environment. Intelligent recording devices and advanced audio processing algorithms are crucial for effective speech recording, capturing associated metadata, audio data encoding, and finally, reproducing sound in the new virtual space. The following image (Figure 1) demonstrates how spatial audio could be captured in a teleconferencing room.

Figure 1. Visualization of spatial audio in a conferencing room.

3. Healthcare—Sound Therapy

Sound therapy technology is often used for the therapy and rehabilitation of disabled children and adults suffering from hearing defects or blindness. 3D audio technology allows for creating solutions for sound exercises and hearing training.

4. E-learning—Interactive Learning

3D audio technology is used to develop e-learning training tools, like sound training for musicians or people learning music. And for interactive learning exercises where there is a need to recognize the speakers (a teacher vs. student) to evaluate a student's performance.

5. Media & Entertainment—Immersive Experiences

Spatial audio delivers tools for producing unique sound experiences that imitate real-life natural environments for entertainment and media production. Immersive audio in theme parks, exhibitions and museums, cinemas, and media production (music production, movies, animation, game sound design, and audiobooks) enhances the way people consume content and enjoy entertainment activities. Each spatial sound effect is well augmented with real-time visualizations that deepen a listener’s sense of immersion.

Spatial audio: How is it made?

Let’s consider the process of creating spatial audio as a combination of three activity areas (Figure 2):

Audio production—covers the processes of design, synthesis, and recording of spatial audio content
Audio processing—includes the application of different algorithm types and methods for audio coding/decoding, mixing, mastering, metadata processing, audio rendering, and interaction control
Audio presentation—takes care of reproduction and playing audio to the listener through the target playback system.

Figure 2. General Spatial Audio creating process

Depending on the spatial audio use case, desired effect, and selected technologies, these three activity areas are involved in the creation process.

Audio R&D: SoftServe’s Framework

In SoftServe, we work towards developing a set of advanced tools—both algorithms and devices—that can be efficiently used for spatial audio capturing, processing, and rendering. Some of these tools also need to provide additional capabilities to extract metadata associated with the spatial audio objects and general acoustic environment.

Building spatial audio systems or their components is associated with various challenges. One worth mentioning and of particular importance to us is providing conformance with any existing systems and environments used for further processing or reproduction of the audio content in virtual spaces.

Our typical framework consists of four main areas/components:

Simulation Platform
We use it to simulate sound wave propagation and generate synthetic audio content in a fully controlled and configurable environment. This helps us mimic the real environment and produce grand truth datasets crucial for the efficient development and evaluation of the system.
Hardware & Algorithms
We focus on the design and development of devices and algorithms used for capturing, processing, and rendering audio in spatial audio systems.
Evaluation Platform.
This platform is used for the reproduction and evaluation of processed audio content. It allows us to assess the efficiency and accuracy of the developed hardware/algorithms, make conclusions, and readjust the design.

This approach helps us stick to the design-build-learn-test cycle - Figure 3.

Figure 3. Design-Build-Test-Learn development cycle.

Latest trends in standardization processes for spatial audio

Researchers and developers offer a variety of standards for spatial audio rendering. Let’s take a look at some of them.

MPEG-H 3D Audio

MPEG-H 3D audio (universal immersive audio coding) was standardized in 2015 and revised in 2019. It’s a versatile standard that supports multiple immersive sound signal formats and higher-order ambisonics (HOA). It’s been adopted in broadcast and streaming applications. The features of MPEG-H 3D Audio include advanced encoding technology, based on MPEG Unified Speech and Audio Coding (USAC), MPEG Spatial Audio Object Coding (SAOC), and the use of multiple rendering technologies. The base rendering method is called Vector Base Amplitude Panning (VBAP), that is combined with technology to play back the rendered signals in headphones or other loudspeaker arrangements.

MPEG-I Immersive Audio

MPEG-I immersive audio (compressed representation for virtual and augmented reality), an extension of MPEG-H 3D audio, is currently under development. It is expected to be completed in 2023 and is targeted for virtual and augmented reality applications. The goal of MPEG-I is to achieve the sense that a user is present in the virtual world. MPEG-I immersive audio will mainly concentrate on the representation and compression of metadata necessary to create a virtual audio scene and on the rendering of the scene for the user’s position and orientation. In this respect, MPEG-I immersive audio will use MPEG-H 3D audio as its audio media compression engine and add additional metadata to that compressed data stream.

OpenXR

OpenXR is a cross-platform API that enables a continuum of real-and-virtual combined environments generated by computers through human-machine interaction. OpenXR is inclusive of the technologies associated with virtual reality, augmented reality, and mixed reality. It’s the interface between an application and an in-process or out-of-process XR runtime that can handle frame composition, peripheral management, and more. It’s an open standard that provides high-performance access to augmented reality (AR) and virtual reality (VR) (known as XR) platforms and devices. The updated version of OpenXR 1.0 specification was released in 2019. It was a significant step towards enabling XR content creation across multiple platforms on the same code base, allowing developers to focus more on their creative vision.

Why SoftServe?

Deep knowledge of human auditory perception and digital audio signal processing helps SoftServe stand out. Having served as a digital provider at the cutting edge of technology for over two decades, we understand the current market needs. In fact, over the last several years, we’ve developed leading-edge spatial audio solutions for various high-tech industry clients.

Our recent endeavor involved creating Binaural Audio Renderer for a new generation of AR devices. The developed solution was based on Higher-Order Ambisonics sound and demonstrated low processing complexity and enhanced flexibility. The interaction with the content was provided by HOA, which introduced 3 Degrees of Freedom (3DOF) manipulation such as yaw, pitch, and roll rotations. The final solution allowed for personalization of the listening experience and interaction with the content.

Meet our team

Our R&D team consists of researchers, engineers, and product managers. On average, each team member has more than five years of expertise in their field.

Here are some of our scholars and experts.

Łukasz Januszkiewicz—Project Lead, Senior Audio R&D Engineer

Łukasz is a senior engineer at R&D. He has more than 10 years of experience in digital audio algorithms research and development. His main professional domain is spatial audio, including microphone array design, sound source separation, beamforming algorithms, binauralization, and Higher Order Ambisonics. For years, Łukasz has participated in the ISO/IEC MPEG standardization meetings, where he has actively contributed to spatial audio coding standards such as MPEG-H 3D Audio and MPEG-I Immersive Audio.

Jakub Wasilewski—Senior Audio R&D Engineer

As a graduate in Electronics and Telecommunications, Jakub is a huge audio enthusiast and has experience in working on all layers of the sound process, from mathematical and physical principles of sound—through the electronic circuit and firmware design—to DSP algorithms, including spatial sound, DOA, beamforming, Ambisonics, and measurements. He is also interested in machine learning, audiophile systems, and planar-magnetic headphone driver design.

Aleksander Kaczmarek—Audio R&D Trainee

Aleksander is a junior engineer at R&D, pursuing a master's degree in Big Data Analytics at Wroclaw University of Science and Technology. His specialization is physics and machine learning as he graduated from a bachelor's program in Quantum Engineering at the same university.

Tomasz Woźniak—Senior Audio R&D Engineer

Tomasz is a software engineer with over seven years of professional experience and a university degree in Acoustic Engineering. Over the years of his academic and later professional career, he has worked with audio and DSP, focusing on 3D audio. He helped develop audio engines for games and music industry audio software processors.

Maria Peńsko—Senior Audio R&D Engineer

Maria is an engineer with over 10 years of experience in radiocommunication and acoustics. As an audio engineer, she worked on spatial audio, speech synthesis, audio signals analysis and processing, and audio evaluation.

Daria Hemmerling—Senior Audio R&D Engineer

Daria is an expert with a PhD in voice quality quantification. She has profound knowledge of all audio-related and signal-processing objects, such as music, voice, and speech. She is skillful in programming, solving problems, and analytical thinking.

Consumers will hear the difference

Spatial audio is a key component of modern applications aiming to deliver fully immersive user experiences. An increasing number of companies are making efforts to develop new algorithms and devices for spatial audio production, processing, and rendering.

The recent standardization activities will result in several spatial audio standards that will certainly enhance technology integration in the near future, making spatial audio an even more popular demanded technology.

Here at SoftServe, our R&D Audio Team works on building the latest, more realistic and immersive audio experiences. Curious to learn more? Let’s talk.

LET'S TALK