Opus Audio Format: The Complete Technical Guide
Opus is an open, royalty-free audio codec standardized by the IETF in RFC 6716 (2012) and developed by Xiph.Org and Mozilla. It uniquely combines two completely different coding technologies — SILK (from Skype, optimized for speech) and CELT (from Xiph, optimized for music and general audio) — into a single hybrid codec that outperforms every other codec across the full bitrate range from 6 kbps to 510 kbps. Opus is mandatory in WebRTC, the standard for modern VoIP, and is supported in all major browsers without a plugin.
Design Goals and Standardization
Opus was designed to solve a problem no previous codec addressed: providing excellent quality for both speech and music across the complete range of bitrates needed for internet communications. Prior to Opus:
- Narrowband speech codecs (G.711, G.729) were excellent for voice but useless for music
- Wideband speech codecs (G.722, AMR-WB) improved voice quality but still couldn't handle music
- Music codecs (MP3, AAC, Vorbis) required 64+ kbps for acceptable quality and added latency
Opus merges SILK (used in Skype since 2009) and CELT (a low-latency music codec) into an adaptive system that automatically selects the right technology for each 20 ms audio frame.
SILK and CELT: The Two Engines
SILK — Speech Coding Engine
SILK (Skype Internet Low-bitrate Codec) is an LPAS (Linear Prediction with Adaptive Spectral) codec:
- LPC Analysis: Computes a Linear Predictive Coding model of the speech signal, separating the vocal tract filter from the excitation signal
- Long-Term Prediction (LTP): Models the pitch period. Speech is quasi-periodic; predicting from one pitch period back dramatically reduces residual energy.
- NLSF (Normalized Line Spectral Frequencies): The LPC filter is parameterized as NLSFs for quantization efficiency
- Noise shaping: Perceptually shapes the quantization noise
- Entropy coding: Arithmetic coding on the residual
SILK operates at 8 kHz (NB), 12 kHz (MB), or 16 kHz (WB) internal sample rates, handling 6–40 kbps.
CELT — Music Coding Engine
CELT (Constrained Energy Lapped Transform) is an MDCT-based codec:
- MDCT: 120-sample or 240-sample blocks (2.5 ms or 5 ms) — much shorter than AAC's 1024-sample blocks, giving lower latency
- Band energy quantization: Energy for each of 21 frequency bands (covering 0–20 kHz) is quantized first
- Pyramid Vector Quantization (PVQ): Spectral coefficients within each band are quantized using a spherical code on a unit sphere. This separates shape from energy, improving efficiency.
- Conditional entropy coding: Residuals after PVQ are coded with Laplacian models
- Range coder: Arithmetic-like entropy coding for the entire frame
CELT operates at 8, 12, 16, 24, or 48 kHz, handling 32–510 kbps.
Hybrid Mode
In the bitrate range of 10–32 kbps for voice content, Opus uses a hybrid: SILK handles the lower-frequency speech components (0–8 kHz) while CELT covers the upper band (8–20 kHz). This allows wideband speech at very low bitrates where neither alone is adequate.
Opus Frame Structure
An Opus packet consists of:
Table of Contents (TOC) byte (1 byte):
- Configuration bits [6:3]: selects mode (SILK/Hybrid/CELT), bandwidth, frame size
- Stereo bit [2]: mono (0) or stereo (1)
- Frame count code [1:0]: how many frames in the packet (1, 2, or code 3 for variable)
Frame payloads: One or more encoded frames. Multiple frames per packet (up to 120 ms total) reduce network overhead at the cost of latency.
Padding (optional): OPUS_SET_PADDING can add unused bytes to reach a target packet size for traffic analysis resistance.
Bandwidth Modes
Opus automatically selects bandwidth based on bitrate and content, or can be forced:
| Mode | Bandwidth | Sample Rate | Bitrate Range |
|---|---|---|---|
| NB (Narrowband) | 0–4 kHz | 8 kHz | 6–20 kbps |
| MB (Medium Band) | 0–6 kHz | 12 kHz | 7–25 kbps |
| WB (Wideband) | 0–8 kHz | 16 kHz | 8–40 kbps |
| SWB (Super-Wideband) | 0–12 kHz | 24 kHz | 12–64 kbps |
| FB (Fullband) | 0–20 kHz | 48 kHz | 16–510 kbps |
Bitrate Reference
| Bitrate | Mode | Quality | Use Case |
|---|---|---|---|
| 6 kbps | SILK NB | Telephony minimum | Extremely constrained links |
| 8–12 kbps | SILK NB/MB | Intelligible voice | Low-bandwidth VoIP |
| 16–24 kbps | SILK WB | Good voice | WebRTC voice calls |
| 32–48 kbps | Hybrid | Excellent voice + acceptable music | Conference calls |
| 64 kbps | CELT FB | Good music | Podcast streaming |
| 96 kbps | CELT FB | High-quality music | Web audio streaming |
| 128 kbps | CELT FB | Transparent for most | Discord, Spotify (some tiers) |
| 192–256 kbps | CELT FB | Near-transparent | High-fidelity streaming |
| 320+ kbps | CELT FB | Transparent | Near-lossless applications |
Container: Ogg Opus
Opus is stored in Ogg container as Ogg Opus (RFC 7845):
- OpusHead logical stream header (first page): version, channel count, pre-skip, input sample rate, output gain
- OpusTags (second page): same key=value format as Vorbis Comments (TITLE, ARTIST, ALBUM, etc.)
- Audio pages: Ogg pages with Opus packets, granule positions in 48 kHz samples
The file extension is .opus (not .ogg). MIME type is audio/ogg; codecs=opus for Ogg-wrapped Opus.
Opus can also be stored in Matroska (.mka, .webm) and MP4 containers.
Latency
Opus achieves algorithmic latency of 22.5 ms (with CELT at 2.5 ms frame size + look-ahead). This is dramatically lower than AAC-LC (~60 ms) and HE-AAC (~120 ms). The practical WebRTC end-to-end latency (including network buffers) is typically 100–200 ms, compared to 200–400 ms for MP3 streaming.
Frame size options:
- 2.5 ms: Minimum latency (CELT only)
- 5, 10, 20 ms: Typical VoIP/WebRTC
- 40, 60 ms: Maximum efficiency per packet (highest compression ratio)
- 120 ms: Maximum allowed frame size
Encoding Commands
# Encode to Opus at 128 kbps (ffmpeg)
ffmpeg -i input.wav -c:a libopus -b:a 128k output.opus
# Encode with opusenc (reference encoder)
opusenc --bitrate 128 input.wav output.opus
# VBR mode (default, recommended)
opusenc --vbr --bitrate 128 input.wav output.opus
# CVBR (Constrained VBR) for streaming
ffmpeg -i input.wav -c:a libopus -b:a 96k -vbr constrained output.opus
# CBR for strict network requirements
opusenc --cbr --bitrate 64 input.wav output.opus
# Low-latency encoding (10 ms frames)
ffmpeg -i input.wav -c:a libopus -b:a 32k -frame_duration 10 output.opus
# Force SILK mode (voice only, low bitrate)
opusenc --speech --bitrate 16 input.wav output.opus
# Opus in WebM container (for web video)
ffmpeg -i input.mp4 -c:v libvpx-vp9 -c:a libopus -b:a 128k output.webm
# Batch FLAC to Opus
for f in *.flac; do opusenc --bitrate 192 "$f" "${f%.flac}.opus"; done
Browser and Platform Support
| Platform | Support | Notes |
|---|---|---|
| Chrome | ✓ Full | Since Chrome 25 (2013) |
| Firefox | ✓ Full | Since Firefox 15 (2012) |
| Edge | ✓ Full | Since Edge 14 (2016) |
| Safari | ✓ Since 15.4 | macOS 12.3, iOS 15.4 (2022) |
| Android | ✓ Full | MediaCodec since Android 5.0 |
| iOS | ✓ Since iOS 15.4 | AVFoundation support |
| Windows 10/11 | ✓ | Media Foundation since 2021 |
Opus vs Competitors
| Codec | Quality at 64 kbps | Quality at 128 kbps | Latency | Royalties |
|---|---|---|---|---|
| Opus | Excellent | Transparent | 22.5 ms | Free |
| AAC-LC | Good | High | ~60 ms | Patent-encumbered |
| HE-AAC v1 | Excellent | Good | ~120 ms | Patent-encumbered |
| MP3 | Fair | Good | ~26 ms | Free (2017) |
| Vorbis | Good | High | ~60 ms | Free |
At 128 kbps and below, Opus is the best-quality open codec available. It is the recommended format for any web audio application and for archival where file size matters more than universal legacy device compatibility.
Related conversions
Audio format pairs that come up most often: