Opus Codec: The Audio Format Explained
Mục lục
Opus Codec: The Audio Format Explained
Why Opus Owns VoIP Audio on the Internet
Audio codecs operate silently in the background, compressing and decompressing information. While it is not as common a term as MP3, Opus is one of the most popular codecs for audio on the internet. Opus is used regularly by billions of users. The codec has native support in Windows 10, macOS, iOS, Android, and is part of the major Unix multimedia frameworks. In addition, since Opus is a mandatory part of the WebRTC standard for real-time communications by browsers and beyond, every modern web browser has support for Opus. Major communications systems like WhatsApp, Facebook Messenger, and most of the world’s video conferencing apps use Opus — among many, many others.
Let’s take a look at the Opus codec: where it came from, what’s so special about it, how it is commonly used, and what we can expect from audio codecs in the future.
How We Got Opus
Opus formed through the intersections of several other codec projects and efforts made by the Internet Engineering Task Force (IETF) to create a universal audio codec for the internet. Back nearly a decade ago, the IETF recognized the internet needed a single, robust audio codec that could be used for all the various use cases of audio transmission over the Internet — from voice calls to live music distribution.
Some of the precursor projects to Opus included the CELT codec, Speex from Xiph.org, and SILK from Skype. Opus actually includes major pieces of these codecs, which we will get into in a moment.
The Opus spec was submitted in 2010 and standardized as RFC 6716 in 2012. As an internet standard, it was then adopted widely by browsers, operating systems, and popular audio/video software. Opus’ place as the de facto codec for real-time communications was later cemented when its use was mandated as part of WebRTC.
Why Opus
Opus is an uncontentious jack of all trades when it comes to audio codecs, bringing together a modern set of features designed to work across a range of environments. The fact that it is an internet standard certainly helps with its popularity, but it became a standard in the first place because of its performance, unique features, lack of royalties, and many open-source references.
Quality
A codec that doesn’t perform won’t last long. Internet codecs need to provide compression to minimize bandwidth, but that compression ratio has to be balanced with the output audio quality and processing needed to code and decode that audio information. One of the main reasons Opus has been so successful is that it has excellent performance in a variety of environments.
The following chart from opus-codec.org clearly illustrates Opus’ performance advantage:
Figure 1. How Opus compares to other popular codecs across a range of bitrates and qualities
As you can see, Opus provides better quality than other popular internet audio codecs at the lowest bitrates.
Comprehensive Combination of Performance Features
How does it have such great performance? A robust set of features help to make this performance possible.
Low Latency
The lag between when a user speaks and when someone else on the other side of the internet hears it is known as audio latency. Low latency is generally always better and it is especially critical when you need real-time interactivity between parties. Physics and the nature of the internet introduce some inherent latencies, so ideally your codec does not add a lot of latency on top of that during the process of encoding and decoding.
Opus provides a very performant 26.5 ms latency using its default settings (20 ms frame size), making it highly suitable for Voice over IP (VoIP) communications.
Narrowband to Fullband
One unique feature of Opus is that it works for both sampling human voices and music. The human voice uses a relatively limited range of frequencies most of the time, and therefore can be compressed more easily into what is referred to as narrowband. Narrowband voice has a subtle, but noticeable distortion when you hear it. Wideband audio that captures the entire human speech range is what most speech oriented codecs strive for. Music, on the other hand, encompases the full spectral range of what our ears can hear. Coding these extra features takes more bits and is typically referred to as fullband.
Codecs that offer high compression at low bitrates are by nature optimized for a limited frequency range. Other codecs address the fullband. Opus is able to support low bandwidth voice communications and the full spectrum of what we hear with something like music because it actually combines parts of 2 different codecs — SILK for narrowband and CELT for wideband. In the most recent versions of the codec, a deep learning-based neural network algorithm is actually used to determine which codec to use at a given point of time, so the optimal approach is always used automatically, even when speech and music are both present.
Controlling the Flow
The internet is based on packets and is designed to assume some of those packets will get lost and will therefore require retransmission. However, there isn’t much time for that retransmission in real-time communications applications, so a different approach is required. Flow control mechanisms are used to prevent a sender from sending more information than a receiver can take. Typically, flow control is handled outside of the codec; but Opus actually has flow control built in for better performance.
Opus is highly tuned to allow it to quickly optimize between optimal fidelity for the available bandwidth. It also has a variable bitrate (VBR) mode that can range from 6 kbit/s to 510 kbit/s to help minimize bandwidth while maintaining consistent quality for applications that require that.
In addition, Opus includes mechanisms for concealing errors and packet loss, such as Forward Error Correction (FEC). FEC lets you send extra information when you know some of it is going to get lost, sacrificing bandwidth for higher quality.
Free as in Freedom + Free as in Beer
Developing a codec takes significant engineering investment, so it is not surprising that most codecs historically carried various licensing charges for their use. The presence of a royalty charge — and even the threat that there could be a patent claim — serves as a giant inhibitor to use. No one wants to risk suddenly being faced with millions of dollars of charges or more. As a result. the market has been shifting to royalty-free codec options. Opus is a good poster child of this movement.
The patent system is never without any risk, but the Opus specification is freely available as an internet standard (RFC 6716). This spec includes a reference implementation that is also freely available under popular license terms (BSD three-clause) with relevant copyright and patent licenses automatically granted by their holders.
This means anyone can use Opus without fear of being sued. After nearly a decade in use, Opus has withstood the test of time on this matter with no one pursuing any legal action.
Implementers Love Open Source
The last reason why Opus is so popular is that it is relatively easy to implement and use. This is because there’s an open-source reference and it’s built into many popular open-source media programs like the Chromium browser engine (used in Chrome, Edge, and others) with others like FFmpeg that have their own independent implementations.
Using Opus
Now let’s take a look how Opus is commonly used on the internet today.
Typical Topologies
Opus is commonly used in real-time communications, live streaming, and live viewing applications. It is often accompanied by video streams, though it certainly does not have to be. For simplicity, I will just review the audio part of Opus architectures, but remember video typically follows a similar, parallel path supported by video codecs.
Calling
As an internet standard in its own right, Opus can work without WebRTC, but as a mandatory codec in WebRTC, WebRTC cannot work without Opus.[1] As a result, Opus is used all the time for transmitting the audio portion of internet-based calls.
Fig 2. Peer-to-peer WebRTC calling using Opus
A media server device may be used to facilitate the mixing or retransmission of Opus in multi-party conferencing use cases.
Fig 3. Multi-party WebRTC conference facilitated by a media server using Opus.
WebRTC was originally designed around a browser framework, but WebRTC with Opus is also part of native iOS, Android, and desktop applications too.
Live Streaming
Opus, with the help of WebRTC, has also proven to be a very useful tool for live streaming. Instead of using specialized hardware, installed software, or soon-to-be-deprecated Flash, broadcasters can send their audio (and video) directly from the browser with just a microphone (and/or webcam). In this architecture, a live streaming media server then converts the incoming Opus audio to typical live stream formats suitable for distribution through a CDN onto end viewers. Oftentimes, a protocol like HLS is used with audio codecs like AAC or MP3.
Fig 4. Typical live streaming architecture using WebRTC from the broadcaster
Live Viewing
Opus can also be used for a similar workflow running in the opposite direction. Existing camera systems often use RTMP, RTSP, or SRT to transmit video with associated audio. Opus, with the help of WebRTC, can be used to transmit this audio and video to a mobile app or browser without the need of any specialized viewing software. The low-latency nature of Opus means viewers can see the stream in near-real-time with minimal extra delays added.
Fig 5. Typical live viewing architecture using WebRTC to a viewer
Opus Options
As mentioned, Opus is a versatile codec with flexibility on how much bandwidth is consumed. In most cases where the users are on the internet and optimum performance is a possibility, implementers should allow the default full sampling at 48kHz and allow the codec to auto-tune to the audio input and network conditions.
Although Opus is generally very processing-efficient, in some scenarios, such as supporting voice-only conversations on a constrained device, implementers may choose to limit Opus’s sampling rate and maximum bitrate to save CPU, conserve bandwidth, or to allow more bandwidth for video.
Media type:
Typical use case:
Recommended bit rate range:
Narrowband speech (NB)
Speech-only on low-bandwidth networks
8 to 12 kbps
Wideband speech (WB)
Speech on a typical network
16 to 20 kbps
Fullband speech (FB)
Speech on a good network
28 to 40 kbps
Fullband monaural music (FB mono)
Music streaming with a stereo microphone
48 to 64 kbps
Fullband stereo music (FB stereo)
Music streaming
64 to 128 kbps
What’s After Opus?
Opus is great, but it is 10+ years old, so what comes next? The answer is…. Opus. Opus continues to improve, with the latest major version release (1.3) adding new low-bitrate performance improvements, better machine learning approaches, and even support for 3D surround sound, ambisonic audio for use in applications like virtual reality (VR). For now, no breakthrough has emerged that provides significantly better performance. With Opus continuing to add improvements, near-universal support in any device that runs a browser, and growing adoption, expect Opus to be the audio codec of choices for internet communications for some time to come.
[1] Technically G.711 is also mandated in WebRTC, but this codec generally only used to communicate with the phone network.
About Chad Hart
Chad Hart is an analyst and consultant at cwh.consulting, a product management, marketing, and strategy advisory specializing in WebRTC and AI in RTC. Chad’s recent projects include authoring an extensive report on the applications of artificial intelligence in real-time communications,… View more