The latest on audio compression standards

Article By : Brian Dipert

Two decades ago, the bulk of digital music was either downloaded from a server or ripped from a CD, and then stored and played back locally. Nowadays, the ascendance of various subscription services has evolved and complicated the situation...

Within my recent coverage of high-res audio players (and the services that feed them content) and external DACs and amplifiers, I realized I’ve been tossing a lot of new and potentially-unfamiliar terms at my readers. It’s been a while (20 years, to be exact!) since I last covered lossless and lossy audio compression standards in-depth, and the world’s moved on since then, so an abbreviated update is probably overdue.

First off, stating the perhaps obvious, two decades ago the bulk of digital music was either downloaded from a server or ripped from a CD, and then stored and played back locally. Nowadays, the ascendance of various subscription services has evolved and complicated the situation; a given service needs to support a diversity of playback clients with a variety of processing capabilities, and network-connected at a diversity of bitrates and latencies (along with variability in both parameters over time). Adaptivity at the cloud is therefore critical.

Take Tidal, for example, a music service I’ve regularly mentioned of late by virtue of its audiophile-targeting “HiFi” service tier. Looking first at the baseline “premium” offering, Tidal streams (and, on some platforms, also offers DRM-inclusive downloads) at two bitrates, 96 kbps (which Tidal calls “normal,” and generally viewed as being equivalent in quality to legacy 128 Kbps MP3) and 320 kbps (“high,” and a higher bitrate than that used by competitors such as Amazon Music Unlimited and Apple iTunes), in both cases leveraging the AAC (advanced audio coding) lossy compression standard. AAC, the audio codec at the core of MPEG-4, wasn’t even included in my test suite back in 2001, only MP3 (MPEG-1 and MPEG-2 Audio Layer III), RealAudio, and WMA (Windows Media Audio). Three-plus years later, I was still writing words such as “AAC so far lacks widespread industry support, thereby limiting the types of equipment on which consumers can play their audio.” Thanks in no small part to Apple’s embrace of AAC, however, particularly after the company dropped its FairPlay DRM, the heir apparent has pretty much taken over the throne.

Now for Tidal’s “HiFi” offering. At minimum, even if the content is “only” Red Book Audio CD-equivalent quality (16-bit per-channel samples, two channels, 44.1 kHz sampling rate, for a ~1.411 Mbps uncompressed bitstream), Tidal still delivers it in one of two lossless-compressed formats to reduce bitrate (and, for downloads, file size): FLAC (the free lossless audio codec) for most clients, and ALAC (the Apple lossless audio codec) for iOS devices. Quoting from Wikipedia, “Digital audio compressed by FLAC’s algorithm can typically be reduced to between 50 and 70 percent of its original size.”

And if larger sample sizes and higher sample rates are your forte, there’s MQA (master quality authenticated), which Tidal supports for its HiFi tier subscribers and with a subset of its content library. MQA, like AAC, is a lossy compression algorithm, which may seem ironic at first glance given that we’re talking about a supposed highest-quality offering (unsurprisingly, the format’s embrace by the audiophile community has been mixed in response). But unlike AAC (and its MP3 and other perceptual encoding peers), MQA does not harness various psychoacoustic modeling techniques to reduce the bitrate, such as those I mentioned 20 years ago:

  • lowpass filtering, or removal of all audio information above a certain frequency;
  • stereo-to-mono conversion of the original two audio channels, completely or above a certain frequency;
  • phase collapse, or elimination of phase differences between the two channels, completely or above a certain frequency;
  • frequency masking, in which a loud tone masks lower-volume information in nearby frequencies; and
  • temporal masking, in which a loud tone masks lower-volume information that both precedes and follows the masking tone in time.

Instead, MQA leverages time-domain ADPCM (adaptive differential pulse-code modulation) for bitrate reduction. Here’s more on MQA from Wikipedia:

[MQA] hierarchically compresses the relatively little energy in the higher frequency bands into data streams that are embedded in the lower frequency bands using proprietary dithering techniques but after the decoding the result would be the lossless archive. After a series of such manipulations, the resulting 44.1 kHz data, the layered data streams, and a final “touchup” stream (compressed difference between the lossy signal from unpacking all layers and the original) are provided to the playback device. Given the low amount of energy expected in higher frequencies, and using only one extra frequency band layer (upper 44.1 kHz band of 96/24 packed into dither of 48/16) and one touchup stream (compressed difference between original 96/24 and 48/16) are together distributed as a 48/24 stream, of which 48/16 bit-decimated part can be played by normal 48/16 playback equipment. One more difference to standard formats is the sampling process. The audio stream is sampled and convolved with a triangle function, and interpolated later during playback.

So, let’s think about this for a minute. In order for Tidal to successfully stream or download a music track to a client, Tidal’s server must first figure out which service tier (premium or HiFi) the account linked to the client is associated with. Then, it needs to determine whether or not downloads are even supported by the client. Next, it must ascertain which quality level options (normal, high, HiFi and/or master) the client supports (and, for HiFi, whether or not the client is an iOS device), as well as what quality option is preferred by the client for both streaming and downloads; below are example screenshots from my Android smartphone’s Tidal app:

screenshot of Android Tidal app streaming options

screenshot of Android Tidal app download options

But desiring a given quality level is a different thing than actually being able to obtain that quality level. If the client is on a poor cellular data connection (either consistently or perhaps only on a transient basis), for example, a 320 Kbps “high” stream may not be achievable, thereby requiring a transitory down-throttle to “normal” mode. And after all this work, you still “only” have a music track resident in the client’s volatile or nonvolatile memory; you then still need to get it from there to the Bluetooth headphones, over an even more bitrate-constrained (not to mention an even lower power consumption-demanding) wireless link.

The standard audio codec currently specified by the Bluetooth SIG (special interest group) as requiring support in any A2DP (advanced audio distribution profile)-based Bluetooth device (transmitter and/or receiver) is SBC (the low-complexity subband codec). SBC is roughly as old as MP3, and as the relevant Wikipedia entry notes, “it was designed to obtain a reasonably good audio quality at medium bit rates while keeping low computational complexity, having Bluetooth bandwidth limitations and processing power in mind.”

SBC does a passable job, especially for voice, but it’s still in the early stages of being replaced by LC3 (the low complexity communication codec). And to call SBC “high fidelity” is, in a word, delusional. Therefore opening the door to a flock of optional other codecs, whose use-or-not is determined via support-or-not interrogation between Bluetooth transmitters and receivers during the initial-connection handshaking process. They include, for example, AAC (yep, the same codec mentioned earlier), which is exclusively leveraged by Apple in its products (presumably to avoid having to pay Qualcomm license fees…keep reading).

Qualcomm is the current owner of the aptX series of codecs (originally supported in Apple’s Macs, for example, but later rescinded), which were acquired by CSR in 2010, with CSR subsequently acquired by Qualcomm five years later. The foundation aptX (originally known as apt-X) codec, the one you’re probably most familiar with, harnesses ADPCM-based lossy compression akin to the earlier-mentioned MQA. It’s been subsequently joined in the portfolio by enhanced, live, voice, and (for Bluetooth music purposes) the LL (low latency, which is particularly useful when attempting to maintain lip sync between a movie’s video and its soundtrack and dialogue, for example), HD (high definition), and adaptive tiers.

In recently exploring my new portable Bluetooth receiver “toy,” I discovered another codec, LDAC. And thanks to LDAC’s Wikipedia definition, I learned about another: LHDC (the low latency high-definition audio codec). LDAC, developed by Sony, was added to the Android Open Source Project repository beginning with Android 8. The encoder’s (i.e., Bluetooth transmitters’) code (and use) are free and open source, although the decoder is proprietary and requires that users obtain a license. My Google Pixel 3a handsets include support for LDAC:

screenshot of the Google Pixel 3A Bluetooth developer options

screenshot of the Google Pixel 3A Bluetooth audio codec options

screenshot of the Google Pixel 3A Bluetooth playback options

And Radsone apparently took a license, because LDAC is the preferred codec in use whenever I connect one of the smartphones to the Earstudio ES100 MK2:

screenshot of Earstudio device details

With that, I’m closing in on 1,500 words, so I’m going to wrap up for now. “Sound” off with your thoughts in the comments!

This article was originally published on EDN.

Brian Dipert is Editor-in-Chief of the Embedded Vision Alliance, and a Senior Analyst at BDTI and Editor-in-Chief of InsideDSP, the company’s online newsletter.

Related articles:

 

Leave a comment