Free Print Subscription Printer-friendly version Email to a Friend

Expanding options bring surround sound to the forefront..and the back…and the sides…and the ceiling…and the floor

( 01 May 2002 )
Brian Dipert, Technical Editor


DVD-Audio and SACD (Super Audio CD) players and their corresponding content are finally appearing in stores, after years of delay and SACD’s underwhelming initial two-channel experiment. And, predictably, each vendor is attempting to convince consumers that its preferred technology represents the superior replacement for audio CDs. Industry pundits are debating not only which format—if any—will ultimately thrive in the marketplace, but also which feature will be the “hook” that convinces consumers to upgrade.

Will success come from the new formats’ sample rates, which are higher than today’s 44.1- and 48-kHz rates? Will it come from their sample sizes, which are larger than today’s 16 and 20 bits? Or will it come from the lossless compression used to store their audio data? Success, in my opinion, won’t come from any of these sales pitches but instead will be found in the new media’s surround-sound characteristics. DVD-Audio discs and SACDs give consumers a fresh presentation of their music-library favorites and an enhanced alternative to new releases of two-channel audio CDs. If this theory holds, the content providers—not the hardware manufacturers—will be the big winners, and DVD-Audio will prove more popular than SACD. The common reason for these predictions is the optional, backward-compatible video partition on DVD-Audio discs, which enables surround-sound playback in Dolby Digital and DTS formats on any of the millions of DVD-Video players now in consumers’ homes.

What drives the sales of all those DVD-Video players? Certainly, the fact that it’s easier to navigate a DVD than a videotape comes into play, as does the perceived durability of optical media. Also, DVDs’ additional storage space enables director’s commentaries, deleted scenes, and other extra features. But an equally or even more important factor is that they allow people to enjoy at home—either through six discrete speakers or a virtualization scheme using fewer speakers—a credible reproduction of the rich surround-sound experience previously available only in movie theaters. Now that consumers have gotten a taste of surround sound, they want all of their listening experiences (one- and two-channel audio playback, computer gaming, and others) to have the same immersive quality. In response, sound-field-expansion, additional-channel-synthesis, and additional-speaker-virtualization-algorithm developers are working with DSP suppliers to deliver ever-improving audio realism.


WHERE IT ALL STARTED:CINEMA SURROUND
It took both the technical ingenuity and the marketing muscle of Dolby Labs to consolidate, in the 1980s, what was for prior decades an incompatible plethora of film-, studio-, and theater-proprietary approaches to surround sound in movies. Dolby Stereo, renamed Dolby Surround for home-theater applications, matrix-encodes four audio channels (left, right, center, and monophonic surround) into a two-channel (Lt (left total) and Rt (right total)) format (Figure 1a). The center and surround channels undergo a 3-dB level reduction before combining with the left and right channels, to maintain constant acoustic power in the overall mix. The Dolby MP matrix encoder also bandpass-filters the surround channel, applies Dolby B-type noise reduction to it, and finally phase-shifts it ±90°, before combining it with the remainder of Lt and Rt.

The matrix-encoding and -decoding process maintains independence between the left and right channels. Because Lt and Rt jointly carry the center-channel information, the passive Dolby Surround decoder creates a phantom center-channel image between the left and right speakers (Figure 1b). Lt and Rt also carry surround information but do so out of phase in each channel with respect to the other channel, thereby diffusing the surround image coming out of the right and left speakers. (This setup is acceptable, but not ideal, with surround effects that intend to be immersive, not location-specific.) And because the surround channel derives from the difference between Lt and Rt, this subtractive process theoretically cancels common center-channel data.

The decoded surround channel also carries left- and right-channel-differential information. By time-delaying the surround output, the decoder ensures that the left and right speakers are first to project this information, employing first-to-arrive precedence and temporal masking effects to direct listeners’ attention toward the front of the room, where it belongs. Maintaining a high degree of separation between center and surround channels requires that the amplitude and phase characteristics of Lt and Rt be as similar as possible. Otherwise, for example, if a mismatch in balance results in center-channel discrepancies between Lt and Rt,
Figure 1
Dolby’s MP matrix process encodes four audio channels into two (a). Dolby Surround (b), Pro Logic (c) and Pro Logic II (d) are successive iterations of the corresponding decoder.
not only will the phantom center image shift to one side, but center information will also leak into the surround channel as crosstalk. The 7-kHz lowpass filter in the surround decoder has two purposes: to eliminate frequencies at which phase-created crosstalk is most noticeable and to reduce the directionality of surround speakers. The noise-reduction decoder also reduces front-channel-signal leakage.

To further minimize the crosstalk effects of Dolby Surround decoders, the subsequent-generation Pro Logic decoder employs logarithmic differential-level circuits with slow- and fast-response modes. It is intended to balance the need for both a stable sound field and for a quick response to rapid channel-level changes (Figure 1c). After detecting sound-field dominance, the decoder applies enhancement in the same direction and in proportion to that dominance. Dolby Pro Logic reached an important milestone in 1988 when, for the first time, semiconductor vendors integrated all the required processing circuits in hardware into a single IC package. Today, ironically, thanks to Moore’s Law trends, most Dolby Surround algorithms run in software on high-speed DSPs.

With many lossy-audio-compression algorithms, both the compressed bit stream and the decoder are standardized, and all of the innovation occurs with the encoder. Conversely, with Dolby MP Matrix innovation in both the encoder and decoder is possible; however, it must remain within the boundary condition of a standardized matrix-encoded two-channel format. As a result, a number of enhancements have emerged over the past 10 or more years, including virtualizers that generate a surround effect using only two speakers. One of these Pro Logic enhancements, created by Fosgate Audionics’ audio pioneer Jim Fosgate, was licensed by Dolby and became Pro Logic II (Figure 1d). Front-versus-back and right-versus-left signal steering now occurs independently, not under control of the same slow-versus-fast-response circuits. Response time with Pro Logic II is also continuously variable and not restricted to only two response modes. Feedback around the steering process delivers benefits in accuracy and dynamic behavior.

Pro Logic II separates the decoded surround signal into right- and left-rear channels and gives both surround channels full frequency bandwidth instead of employing a 7-kHz cutoff. The Movie mode incorporates preset decoding characteristics. The Music mode allows customization of dimension (overall sound-field shift to front or rear); center-channel width (restricted to the center speaker or extended to a wider “phantom” image using the left- and right-front speakers); and panorama (an extension of the left- and right-front images to the sides of a listener, using the rear surround speakers).

A mild high-frequency shelf filter in Music mode mimics the roll-off created by room reflections and absorption, and signal delay compensates for the surround speakers’ closer proximity to the listener than the front and center speakers. Bass management routes low-frequency portions of front, center, and surround speakers to a dedicated subwoofer, if one exists. Pro Logic II also is the first matrix decoder that Dolby confidently advertises as capable of extracting spatial cues in two-channel music recordings to create a realistic 3-D presentation.


DISCRETE ISN'T DISCREET
Had Pro Logic II appeared in the late 1980s, there might not be much need for an encoding system that outputs more than two discrete-audio channels. However, several pressing needs of the time (such as DVD-Video, DSS, and DTV) begged for a format that delivered full frequency range to distinct right- and left-rear surround speakers. This format is also needed to provide a low-frequency-effect “.1” channel with bass-management capabilities, to support delay adjustment that comprehended different speaker placements, and to simultaneously detect and process multiple dominant signals.


Figure 2
The SDDS system supplements the traditional left-, right-, and center-front speakers with left-center and right-center channels to reinforce dialogue and side-to-side front sound transitions (courtesy Surround Associates).

The initial answer to these requests appeared in 1992 in the form of Dolby Digital, which employed AC-3 perceptual compression to squeeze as many as five full-range audio channels and a separate 20- to 120-Hz low-frequency-effects channel into a 384- or 448-kbps digital bit stream. Many people incorrectly interchange the terms “Dolby Digital” and “AC-3.” AC-3 lossy compression supports a range of bit-stream sizes and a diversity of audio-channel characteristics and numbers. Dolby Digital is but one implementation of AC-3. Also, a Dolby Digital label on a DVD doesn’t always mean that it offers six discrete-audio channels. Older movies might provide only a monophonic, two-channel or Dolby Surround-derived four-channel soundtrack. All Dolby Digital decoders downmix to Dolby Surround matrix two-channel outputs to enable, for example, the use of a DVD player or DTV receiver with an older Dolby Pro Logic receiver.

In 1993, the year after Dolby Digital’s major-motion-picture debut with Batman Returns, competitor Digital Theater Systems made a highly visible first appearance with Jurassic Park. Whereas DTS for movie theaters uses the apt-X100 perceptual-compression scheme, DTS on DVDs and CDs employs Coherent Acoustics lossy compression. DTS began with two strikes against it, making its current success all the more impressive. DTS wasn’t a DVD-consortium-approved audio format, like PCM or Dolby Digital. Therefore, DTS took up incremental DVD storage space that might otherwise have found use for other multimedia features in a PCM- or Dolby Digital-only release, and adding DTS support required equipment vendors and content providers to pay an incremental licensing fee. DTS’s initial bit rate of 1509 kbps is also three to four times larger than the bit rate of a Dolby Digital file, meaning that DTS takes up even more room on a DVD for the same playback time.

DTS cleverly attempted to turn this second potential negative into a positive with marketing messages that equated more bits with better sound. In reality, most double-blind listening tests comparing the two formats on identically mixed material and at equivalent listening levels reveal little or no difference between them. The “equivalent-listening-levels” qualifier is important, because many DTS soundtracks are fractions of a decibel to several decibels “louder” than their Dolby Digital counterparts—a fact that you can verify at home using an inexpensive Radio Shack audio-level meter. Unless you equalize the two presentations’ levels, the human auditory system will likely equate a louder presentation with one that sounds better.


Figure 3
PDM employs single-bit coding and delta-sigma DACs along with a very high sampling rate to reconstruct the original signal (courtesy Sony).

Dolby and THX’s Surround EX 6.1 format, along with the conceptually similar DTS-ES Matrix, employ matrix-encoding techniques to add a center-back channel intended to smooth out left-to-right and right-to-left surround-sound transitions. (If the center-back channel is output over two speakers, it is called “7.1.”) Digital Theater Systems designed the DTS bit-stream format to be forward-extensible to a range of bit-stream sizes, additional audio channels, larger sample sizes, higher sampling rates, and other enhancements, while remaining backward-compatible with prior-generation decoders. DTS-ES Discrete 6.1, the first example of this flexibility, also supports a discrete (not matrix-encoded) center-back channel. THX Ultra2 uses seven channels of amplification to play back any multichannel-encoded program through a seven-speaker and single-subwoofer layout.

THX Ultra2’s Cinema mode and Music mode automatically detect program material with 5.1 or more channels and apply proprietary processing that blends the directional and ambient surround information prior to playback through four surround speakers, two at the sides and two at the back. Ultra2 receivers and controllers also feature switchable boundary gain compensation to alleviate “boomy” bass performance that can occur with listening positions near a wall. Sony advocates SDDS (Sony Dynamic Digital Sound), a format that employs ATRAC audio compression and, to date, has appeared only in movie theaters. Whereas Dolby Digital and DTS focus on increasing the number of rear channels, SDDS allocates two discrete channels for left- and right-center speakers to reinforce all-important dialogue (Figure 2).


THE SURROUND SOUND OF MUSIC
Recently, Digital Theater Systems developed a H-bit-rate 754-kbps format to address storage-space issues and more easily enable dual-audio-format DVDs. Understandably, the company has been reluctant to publicize this shift. Digital Theater Systems has also embraced the concept of DTS-encoded audio CDs. Incapable of playback on conventional CD players, in which their output sounds like random noise, DTS audio CDs deliver a rich six-channel audio presentation on conventional CD players with digital outputs (paired with separate DTS decoders) or in DTS-aware CD and DVD players.

Dolby Labs, unlike DTS, advocates backward-compatible Dolby Surround encoding as its preferred surround-sound format for Red Book audio CDs. Neither Dolby nor Digital Theater Systems has been enthusiastic about audio-only DVD-Video discs, although this stance may change with the unveiling of DTS’s 24-bit/96-kHz-sampling-rate format. However, both companies advocate their audio formats’ use in music videos, live concert broadcasts, and other high-fidelity audio-plus-video presentations.

Both companies have instead thrown varying degrees of support behind DVD-Audio. In many respects, DTS CDs are the forerunners of DVD-Audio discs and SACDs; therefore, it’s no surprise that Digital Theater Systems has begun rereleasing its music library in DVD-Audio form, with each disc containing two-channel Dolby Digital, six-channel DTS (both of which are in the DVD-Video partition), and six-channel MLP (meridian lossless packaging) formats. Meridian Audio developed MLP, and Dolby Labs handles its licensing. It is a flexible lossless-compression scheme that supports different sample rates and sizes in various channels of the same audio track. Its use is optional on DVD-Audio media, but all DVD-Audio decoders must support it. MLP decoders can also automatically downmix a more-than-two-channel source file to a two-channel output stream if a producer doesn’t want to use up DVD storage space with, for example, both 5.1-channel and two-channel partitions.

MLP’s intent is twofold: to keep the peak bit rate at less than 9.6 Mbps, the speed at which players stream data off the disc, and to enable playing times that meet or exceed an audio CD’s 74 minutes, to which consumers have already grown accustomed (Table 1 and Table 2). Lossless-compression efficiency is a function of the degree of randomness in the source material, so the MLP encoder enables producers to easily iterate the compression process until they achieve the desired balance of playback time and per-channel fidelity. The most baffling aspect of DVD-Audio is the optional presence of watermarking developed by Verance. It employs perceptual coding schemes similar to the ones at the heart of MP3 and other lossy-compression algorithms. Degrading otherwise-pristine audio data with watermarking bits seems counterintuitive.

Sony and Philips have for years enjoyed the hefty royalties they obtain from CD patent-licensing arrangements, and the two companies are unenthusiastic about DVD-Audio’s potential to dry up that stream. Therefore, the partners developed the SACD. Akin to DVD-Audio’s video partition, which ensures media backward-compatibility with DVD-Video players, SACDs can employ a multilayer hybrid scheme. The scheme is more expensive than audio CDs but enables SACD playback in CD players. And because Sony is both an audio-equipment manufacturer and a record label, it’s unsurprising that at this early stage, the SACD library dwarfs the number of available DVD-Audio titles. Further helping this fact is Universal Music Group’s plans to support SACD.

Whereas CD and DVD players read serial bit streams off of discs and assemble them into multibit PCM samples, SACD’s DSD (direct-stream digital) technology directly decodes single-bit data on discs at a 2.8224-MHz sampling rate (Figure 3). This approach, called PDM (pulse-density modulation), employs delta-sigma DACs to reconstruct original waveforms. SACDs also incorporate watermarking for media security, but unlike the bit-altering scheme that DVD-Audio uses, Sony and Philips selected an identification scheme called PSP, which embeds an array of microscopic indentations on the disc. The resulting text or graphical image can be visible or invisible. SACD, like DVD-Audio, also takes advantage of lossless compression with a 50% average efficiency to boost playback time and minimize peak bit rates.

PCM’s oversampling versus SACD’s delta-sigma conversion has been the topic of numerous debates and AES papers over the past year. The following points summarize the arguments against single-bit processing:


  • Quantizers, such as ADCs, can be made linear with the addition of dither (random noise). Without dither, they’re subject to various signal distortions.


  • The optimal form of dither is a triangular probability distribution function at the LSB.


  • In a 1-bit system, such as SACD, the LSB is the only bit. Therefore, complete dither addition is impractical.

SACD advocates predictably dispute these claims and the theory behind them and argue that any such distortions are far less egregious than those that PCM’s quantization creates during encoding and that oversampling creates during decoding. Ironically, however, SACD employs PCM-like techniques in the recording studio, most likely to keep the data in a DSP-friendly multibit format, and transforms it to DSD before creating an SACD master. A “pure” DSD process seems feasible only when an audio engineer directly transfers a recording (such as a live concert) to an SACD with no mixing, level adjustment, equalization, compression, or other modification.


SYNTHESIS,VIRTUALIZATION,EVALUATION
In addition to surround-sound-format decoding, today’s DSPs must handle a number of audio-post-processing operations: HDCD (high-definition compatible digital) decoding, THX processing, bass management, room-acoustics adjustment, sound-field expansion (the so-called “stereo-wide” function), additional channel synthesis via ambiance extraction or artificial ambiance insertion, speaker virtualization, and others. Each incremental function requires additional memory and additional processing MIPS. And, at some point, you must upgrade to a more costly and power-hungry DSP than you originally planned.

Confusingly, products from different vendors that, on paper, appear to implement similar functions, vary widely in memory- and processing-horsepower budgets (see Table 3). Keep in mind that the values in Table 3 are only estimates; by working with vendors you can often increase these values (for improved audio quality) or decrease them (to reduce the required memory and performance budget) as needed. For example, a processing- or memory-deficient system might employ approximation filters with altered cutoff frequencies and gentler roll-offs, use fewer and shorter delay lines for generating reverberation, or carry less precise interim values through a series of calculations. The specifications also depend greatly on the bit precision and presence of floating-point capability in the DSP, as well as any available opportunities for parallelism.

In evaluating algorithm alternatives, you not only must calculate chip costs, you must also compare licensing fees. How else can you decide whether you need a more system-resource-intensive algorithm, should turn to a comparatively svelte competitor, or should develop your own hardware and software to perform the function? Using the following analysis criteria, along with your ears, may help you make your decision:

  • How large is the perceived sound stage that the algorithm can create, beyond the physical speaker boundaries?


  • Does the soundstage expand and contract, or is it stable?


  • How big is the listening “sweet spot;” do its size and location vary with time?


  • How precisely can you locate a specific sound source? Does the algorithm give good height-location accuracy? Can you perceive sounds a full 360° around the listener, and do they smoothly move from one location to another?


  • Do instruments, vocals, and other audio sources shift in an unintended fashion between various speakers over time?


  • Does crosstalk cause sound to leak from one channel into another? For example, does center-channel dialogue appear in the surround channels?


  • How clearly can you differentiate all-important center-channel information (such as movie dialogue) in an environment of coincident front-right and -left, surround, and subwoofer audio?


  • Is there a perceptible delay between when a sound source appears on the screen (such as when an actor’s lips move) and when you hear the sound, indicating excessive required audio processing?


  • Are the HRTFs (head-related-transfer function) customizable to individual listeners’ head and ear specifications, and are listening-environment parameters adjustable to calibrate to actual room acoustics?

Keep in mind that, without a direct A-versus-B comparison in an acoustically ideal listening setting by a skilled salesperson, most listeners judge that nearly any enhanced system sounds “good enough” compared with an unenhanced alternative.


ACKNOWLEDGMENTS

Kudos to the vendors courageous enough to let me publish their algorithms’ processing and memory-resource approximations in Table 3.


REFERENCES
1. Dipert, Brian, “Digital audio gets an audition: part 1, lossless compression,” EDN, Jan 4, 2001, pg 48.
2. Dipert, Brian, “Decoding and virtualization bring surround sound to the masses,” EDN, Oct 25, 2001, pg 63.
3. Dipert, Brian, “Now hear this,” EDN, Feb 3, 2000, pg 50.
4. Dipert, Brian, “Digital audio breaks the sound barrier,” EDN, July 20, 2000, pg 71.
5. Dipert, Brian, “Digital audio gets an audition: part 2, lossy compression,” EDN, Jan 18, 2001, pg 87.
6. Dipert, Brian, “Don’t forget about the end user,” EDN, Nov 8, 2001, pg 33.
7. Pohlmann, Ken C and David Ranada, “Dolby Digital vs DTS, which is better?” Sound & Vision.
8. “Dolby Evaluates DTS, parts 1 and 2,” Dolby Laboratories white papers.
9. “DTS position on ‘Dolby Evaluates DTS,’ parts 1 and 2,” DTS white papers.
10. Dipert, Brian, “Media security thwarts temptation, permits prosecution,” EDN, June 22, 2000, pg 101.
11. Dipert, Brian, “Security scheme doesn’t hold (water)marking,” EDN, Dec 21, 2000, pg 35.
12. Ranada, David, “Dancing on pinheads,” Sound & Vision, January 2001, pg 41.



You can contact Technical Editor Brian Dipert at
(1) 916-454-5242, Fax (1) 916-454-5101
E-mail bdipert@pacbell.net

 
Free Print Subscription Printer-friendly version Email to a Friend
Article Rating 
Average Rate: No rating yet
 
Poor Quite Good Good Very Good Excellent
 
 
Related Content 
 
 
WEBCASTS
 
KNOWLEDGE CENTER
Panasonic Key Devices Guide 2008:
 
Fairchild Semiconductor :
 
 
Highest Rated  
 
 
 
ADVERTISEMENT
Press Release 
 
TECHNOLOGY NEWS
 
RESOURCE CENTER


 
 
PRODUCT NEWS
 
FEATURED SPONSORS


 
 
 
DESIGN CENTERS
 
ADVERTISEMENT
     
Reference Designs 
   
     
 
 
 


RSS
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   

POLL
What type of environmental regulation do you think will be most beneficial for the tech industry?
Proper recycling and disposal
Push for power efficiency and energy conservation
Chemical/lead regulation
View results

Outlook and Trends 2008