TABLE OF CONTENTS


Summary

1. Introduction

2. Project Management

3. Implementation

4. Conclusion & Recommendations

Appendix A

Appendix B


SUMMARY

The current best method of storing music is digitally. The main problem with this format it the ammount of information which is required to retain 'CD-quality' sound. For a 5 minute song, approximatly 50 megabytes of storage space is required. One possible way to overcome this problem is to compress the audio signal. An international standard called MPEG tries to remove the irrelevant parts of the signal, and the redundant parts of the signal. The parts of the sound that humans cannot hear can be thrown away. Using this method, the 50 MB space can be reduced to apporximatly 5 MB. However, to implement the decoding algorithm in real-time one needs to perform about 12 million instructions per second. Using todays computers this will take up the entire power of the computer and render it useless while decoding. Our project is to design a module that will decode a MPEG compressed bistream in real-time, while keeping the computers CPU use to a minimum.

This document is a progress report for the MPEG decoder module design project by Kire Filipovski and Mike Schwankl. The decoder module is a stand-alone device capable of decompressing a MPEG encoded bitstream. The decoder will externially interface with a computer via its serial port and provide an analog output rivaling a cd-quality signal.

Since our proposal, the design has consisted of researching the commercially available single-chip MPEG decoders. After researching for several months the TMS320AV120 Texas Instruments DSP chip was chosen. Currently our project is in the phase of desiging the draft version of the decoder module.


1. INTRODUCTION

The most used form of digital audio is the compact disc. The analog signal is sampled at 44.1 kHz, where the samples are represented with 16 bit precision. A 5 minute song uses approximately 50 MB (megabytes) of data. This amount of data is expensive to store, and impractical to send across current networks. The development of powerful DSP chips have made it possible to implement encoding and decoding using complex compression algorithms which yield substantial compression rates. The algorithms currently used are based on psychoacoustic models used to remove information describing sounds which the human ear cannot recognize. The most effective algorithms can reduce the amount of data by 92% with little or no audible loss of sound quality. The purpose of our design is to build a decoder module using a DSP chip which will decode a compressed bitstream and convert it to an analog CD-quality signal. Since it is a relatively new research area there are a number of different algorithms. After researching the availability and trends of a few compression standards, we chose to implement a MPEG-1 layer 2 decoder, which gives CD-quality music at a compression ratio of 6:1. A short introduction to this standard is presented in Appendix A.

There have been very few commercial products that do what our design will accomplish. The currently available products have been upward of $1500. Our product is estimated to cost less than $100 of parts to build.

The implementation of our decoder module was initially to be interfaced with a computer via an ISA bus. After months of research on writing device drivers and MPEG decoder chips, we decided to use the serial port of a PC to interface our decoder module. We decided upon this because the bandwidth of a MPEG audio compressed bitstream coming from the computer to our module will only be between 64 kbit/s and 128 kbit/s, while the average bandwidth of a serial port is 150 kbit/s. In addition, almost every computer has a serial port and the ISA bus is being replaced by the new PCI bus.

Our decision to implement a MPEG-1 layer 2 decoder was because of commercial availability and technical support. We have been contacting various companies for samples, but yet to receive anything. The companies that have single-chip implementations of the MPEG decoding algorithm are listed in the Table 1. along with their current status. The reason to only implement a decoder is because of the complexity and price of current real time encoders.

Company Part No. Status
Crystal Semiconductors CS4920 3 - 5 samples ordered
Texas Instruments TMS320AV120 Revising the Data Sheet specifications
Phillips Semiconductors SAA2500 Revising the Data Sheet specifications
Table 1. MPEG chips availability

Each of these available chips are introduced in this progress report. Their features and way of operation are then compared to suggest which is the most suitable solution for our design.


2. PROJECT MANAGEMENT

The design team consisted of two members: Kire Filipovski (E.E.), and Mike Schwankl (C.P.E.). Most of the design will be a joint effort, but there are some tasks that will be assigned separately. The division is based on previous experiences as stated in out resumes in Appendix B. The majority of the computer interfacing is done by Mike Schwankl, and the bread-boarding and testing is done by Kire Filipovski. The complete design management is summarized in the gantt chart shown in Figure 1.


3. IMPLEMENTATION

3.1 SPECIFICATIONS

This decoder module shall decode a compressed MPEG-1 layer 2 bitstream, as defined within "ISO/IEC 11172-3 MPEG Standard." Our design will have to meet the following specification in accordance to the international MPEG standard:

The module shall be a separate device which will be interfaced with a PC via the serial port. The input to the module will be a compressed 128 kbit/s bitstream and the output will be an analog signal with a line level of 2 volts peak-to-peak nominal, into 1 k impedance.

3.2 THEORY OF OPERATION

The basic operation of our design is shown in the following block diagram - Figure 2.

Figure 2. Functional Block Diagram

The main part of this design will be a DSP processor capable of decoding a MPEG compressed bitstream in real-time. The interfacing logic and the D/A converter are very much dependent on the choice of the decoder. Some chips even have a built-in D/A converter. The interfacing logic will make the transition from the serial port to the decoder chip. It will provide the serial bitstream to the decoder and control the flow from the computer. The MPEG chip then decodes the compressed bitstream producing either a PCM or an analog signal depending on the chip used. In the case of a PCM output, the signal is then delivered to a D/A converter. The analog output can be then supplied to an external audio amplifier.

Next, all the possible decoder chips which we have researched for this design are presented and compared. The three most feasible processors are discussed separately and their basic operation is explained using functional block diagrams.

3.2.1 Philips SAA2500

Features

The SAA2500 supports all audio modes (joint stereo, stereo, single channel and dual channel), bit rates and sample frequencies of MPEG-1 layers I and II. This chip has option for two input bitstreams: master and slave. It offers automatic sample frequency and bit-rate switching, and automatic synchronization of input and output interface clocks in master input mode. The output is a serial PCM bitstream with a selectable precision of 16, 18, 20 or 22 bit. It has low power consumption and comes in a plastic quad flat 44-pin package.

Operation


Figure 3. Functional Block Diagram of Philips SAA2500

This chip has two serial inputs as mentioned above: master and slave, going into the input processor. The clock generator block uses an output circuitry to generate the necessary frequencies needed for operation. The actual decompression is performed in the dequantization and scaling processor using the signals from the decoding control block. The output is a PCM bitstream from the pin SD, which is suitable for direct input to most commercially available one-bit D/A converters.

Comment

Considering the purpose of our application - multimedia PC, there is no use of the option of two different inputs since we will only be using one of them. One possible problem in using this chip could be the decoding control block interfacing. This chip requires an outside host (a microcontroller) for the decoding control and would make this design more complex. However, the chip has a very extensive data sheet and good technical support.

3.2.2 Texas Instruments TMS320AV120

Features

The TMS320AV120 is a simplified version of its predecessor TMS320AV110. It is intentionally designed as a simpleplug and play decoder which does not require a host microprocessor. This chip also supports all the MPEG-1 layer II specifications for sampling and bit-rates and channel modes with output resolution of 16 or 18 bit PCM data. It is manufactured in low-power submicron CMOS technology in a 44-pin package.

Operation


Figure 4.
Functional Block Diagram of TMS320AV120

The compressed input bitstream is brought to SIN (serial input) pin and read by the host

interface block controlled by the request signal SREQ and the clock for the input rate ICLK. The bitstream is then buffered allowing for error correction due to small variations of the input signal. The decoding takes place in the audio decoder and the arithmetic unit. Some control pins are used to control the decoding process (for example PCMSEL pins are used to determine the PCMOUT resolution, OMODE pins are used to define the channel mode etc.). The chip also contains an output buffer before the signal proceeds to the PCMOUT pin. The signal can then be fed into a one-bit D/A converter.

Comment

This chip is a simple stand-alone MPEG audio decoder. Simplicity arises as its main advantage making it fairly easy for use in design projects. Another available option is the bypass utility of this chip. The chip can bypass the input bitstream to the output if it is not compressed by driving BYPASS high. This gives an opportunity for this module to be also used as a regular DAC sound card. The TMS320AV120 offers some additional features that are irrelevant for our design. TI has excellent technical support and many available DACs which will interface with the TMS320AV120 such as the TMS57014 Dual Audio Digital-to-Analog Converter.

3.2.3 Crystal Semiconductor CS4920

Features

Unlike the previous two chips, this is a general purpose 24-bit DSP processor optimized for audio processing. A wide variety of standard and proprietary decompression algorithms can be supported. The main feature is the on-chip CD-quality D/A converter. MPEG decoding is utilized by implementing the algorithm via a ROM. It has serial input and serial control port. The output of this chip can be either analog or standard S/PDIF compatible digital output. It is manufactured in a standard 44-pin package.

Operation

Input serial data is brought to Serial Audio Port. The decompression takes place in the DSP block using the Serial Control Port signals. After that the signal splits to the Stereo DAC where the digital signal is converted to analog stereo signal, and to the AES/EBU - S/PDIF Transmitter which produces digital output.


Figure 5. Functional Block Diagram of CS4920

Comment

The MPEG decompression algorithm is supplied on a ROM chip which has to be loaded onto the CS4920. Although this makes it easier to implement other compression algorithms such as Dolby AC-3 or MPEG layer 3 when they become available, it increases the complexity and price of our design. The fact that the chip contains an on-chip DAC would simplify our design, but the complexity of implementing the microcontroler and ROM is not known. The evaluation board (CDB4920) which contains the CS4920, microcontroler and other IC necessary for operation is available for $495 from Insite Electronics.

3.2.4 Other decoder chips

During our research there were several other MPEG decoder chips. In our proposal we had stated that we would implement a MPEG layer 3 decoder. There was only one commercially available MPEG layer 3 chip, the MASC3500 from ITT Intermetall Semiconductor. We attempted to obtain information and availability of the chip, but the company is situated in Germany and gave very little feedback. There are various other layer 2 chips from LSI logic, Zoran Semiconductor, GC technologies, and Sanyo Electric Co, which had very little technical support or were extremely costly, so our choice was narrowed to the chips presented in depth.

3.3 RESULTS

This section compares all the processors showing their advantages and disadvantages. The comparisons are summarized in Table 2.

Part No. Advantages Disadvantages
CS4920 built-in D/A converter
exchangable decoding algorithms
outside host required
price
TMS320AV120 simplicity
stand-alone device
uncompressed input
low cost
none
SAA2500 two inputs outside host required
Table 2. MPEG chips comparison

First major division between these devices is based on the implementation process of the MPEG decoding algorithm. Most of the chips have the actual decoder masc-programmed, meaning that the decoding algorithm is built into the DSP when purchasing it. The other case is when the chip is a general purpose DSP, and the algorithm is implemented by the user i.e programmable DSP chip (such as CS4920 from Crystal Semiconductor). Normally, it is more convenient to use "MPEG ready" chip, but CS4920 is also taken into consideration because of its on-chip DAC.

All of the chips are 44-pin packages and very similar in their design. However, the TMS320AV120 from TI is the only stand-alone chip. The others require an outside host such as a microcontroller, which increases the complexity of the design.

Another advantage of the TI chip is the amount of technical support and on-line information that Texas Instruments provides. Phillips and Crystal Semiconductor also have sufficient tech-support, but the educational services from TI are very helpful.

As far as costs are concerned, the rough estimate is that the TI and Philips chips are approximatly in the same price range ($20-40), while the Crystal Semiconducor is slightly more expensive because of the on-chip DAC.


4. CONCLUSION & RECOMMENDATIONS

After several months of research it has been determined that the Texas Instruments TMS320AV120 would be the best MPEG decoder for this design, becouse of its simplicity, low-cost, and excelent technical support. The remaining design process includes proper interfacing of the chip with the computer and with the D/A convertor.

The end product will be a stand-alone module easily interfaced with any PC using its serial port. The main application of this device is music arciving on a computer storage media. The module will reduce the memory requirements by approximately 90% while maintaining CD-quality playback with minimal use of the computers CPU. This device will also make possible CD-quality playback of remote music archives through computer networks, because of the low bandwith requirements. It is a contribution to the multimedia revolution in todays communication networks.


APPENDIX A

Introduction to the MPEG standard

What is MPEG?

MPEG stands for Motion Pictures Experts Group. MPEG is a group of people that meet under ISO (the International Standards Organization) to generate standards for digital video(sequences of images in time) and audio compression. In particular,they define a compressed bit stream, which implicitly defines a decompressor. However, the compression algorithms are up to the individual manufacturers, and that is where proprietary advantage is obtained within the scope of a publicly available international standard. MPEG meets roughly four times a year for roughly a week each time. In between meetings, a great deal of work is done by the members, so it doesn't all happen at the meetings. The work is organized and planned at the meetings.

How does MPEG-1 AUDIO work ?

Well, first you need to know how sound is stored in a computer. Sound is pressure differences in air. When picked up by a microphone and fed through an amplifier this becomes voltage levels. The voltage is sampled by the computer a number of times per second. For CD-audio quality you need to sample 44100 times per second and each sample has a resolution of 16 bits. In stereo this gives you 1.4 Mbit per second and you can probably see the need for compression.

To compress audio MPEG tries to remove the irrelevant parts of the signal and the redundant parts of the signal. Parts of the sound that we do not hear can be thrown away. To do this MPEG Audio uses psycho-acoustic principles.

How good is MPEG-1 AUDIO compression ?

MPEG can compress to a bitstream of 32 kbit/s to 384 kbit/s (Layer II). A raw PCM audio

bitstream is about 705 kbit/s so this gives a max compression ratio of about 22. Normal compression ratio is more like 1:6 or 1:7. If you think that this is not much please remember that unlike video we are talking about no perceivable quality loss here. 96 kbit/s is considered transparent for most practical purposes. This means that you will not notice any difference between the original and the compressed signal for rock'n roll or popular music. For more demanding stuff like piano concerts and such you will need to go up to 128 kbit/s.

How does MPEG-1 AUDIO achieve this compression ratio ?

Well, with audio you basically have two alternatives. Either you sample less often or you sample

with less resolution (less than 16 bit per sample). If you want quality you can't do much with the sample frequency. Humans can hear sounds with frequencies from about 20 Hz to 20 kHz.

According to the Nyquist theorem you must sample at least two times the highest frequency you want to reproduce. Allowing for imperfect filters, a 44.1 kHz sampling rate is a fair minimum. So you either set out to prove the Nyquist theorem is wrong or go to work on reducing the resolution. The MPEG committee chose the latter.

Now, the real reason for using 16 bits is to get a good signal-to-noise (s/n) ratio. The noise we're talking about here is quantization noise from the digitizing process. For each bit you add, you get 6 dB better s/n. (To the ear, 6 dBu corresponds to a doubling of the sound-level.) CD-audio achieves about 90 dB s/n. This matches the dynamic range of the ear fairly well. That is, you will not hear any noise coming from the system itself (well, there is still some people arguing about that, but lets not worry about them for the moment). So what happens when you sample to 8 bit resolution ? You get a very noticeable noise floor in your recording. You can easily hear this in silent moments in the music or between words or sentences if your recording is a human voice. Waitaminnit. You don't notice any noise in loud passages, right? This is the masking effect and is the key to MPEG Audio coding. Stuff like the masking effect belongs to a science called psycho-acoustics that deals with the way the human brain perceives sound. And MPEG uses psychoacoustic principles when it does its thing.

Explain the masking effect

Say you have a strong tone with a frequency of 1000 Hz. You also have a tone nearby of say

1100 Hz. This second tone is 18 dB lower.You are not going to hear this second tone. It is

completely masked by the first 1000 hz tone. As a matter of fact, any relatively weak sounds near a strong sound is masked. If you introduce another tone at 2000 Hz also 18 dB below the first 1000 hz tone, you will hear this. You will have to turn down the 2000 hz tone to something like 45 dB below the 1000 hz tone before it will be masked by the first tone. So the further you get from a sound the less masking effect it has. The masking effect means that you can raise the noise floor around a strong sound because the noise will be masked anyway. And raising the noise floor is the same as using less bits and using less bits is the same as compression.

Let's now try to explain how the MPEG Audio coder goes about its thing. It divides the frequency spectrum (20 hz to 20 khz) into 32 sub-bands. Each sub-band holds a little slice of the audio spectrum. Say, in the upper region of sub-band 8, a 1000 hz tone with a level of 60 dB is present. OK, the coder calculates the masking effect of this sound and finds that there is a masking threshold for the entire 8 thsub-band (all sounds w. a frequency...) 35 dB below this tone. The acceptable s/n ratio is thus 60 - 35 = 25 dB. The equals 4 bit-resolution. In addition there are masking effects on band 9-13 and onband 5-7, the effect decreasing with the distance from band 8.I a real-life situation you have sounds in most bands and the masking effects are additive. In addition the coder considers the sensitivity of the ear for various frequencies. The ear is a lot less sensitive in the high and low frequencies. Peak sensitivity is around 2-4 kHz,the same region that the human voice occupies.

The sub-bands should match the ear, that is each sub-band should consist of frequencies that have the same psychoacoustic properties. In MPEG layer II, each subband is 625Hz wide. It would been better if the sub-bands where narrower in the low frequency range and wider in the high frequency range. To do this you need complex filters. To keep the filters simple they chose to add FFT in parallel with the filtering and use the spectral components from the FFT as additional information to the coder. This way you get higher resolution in the low frequencies where the ear is more sensitive.

But there is more to it. We have explained concurrent masking, but the masking effect also occurs before and after a strong sound (pre- and post-masking)

If there is a significant (30 - 40 dB ) shift in level. The reason is believed to be that the brain needs some processing time. Pre-masking is only about 2 to 5 ms. The post-masking can be up till 100 ms. Other bit-reduction techniques involve considering tonal and non-tonal components of the sound. For a stereo signal you have a lot of redundancy between channels. The last step before formatting is Huffman coding.

The coder calculates masking effects by an iterative process until it runs out of time. It is up to the implement or to spend bits in the least obtrusive fashion. For layer II the coder works on 23 ms of sound (1152 samples) at a time. For some material the 23 ms time-window can be a problem. This is normally in a situation with transients where there are large differences in sound level over the 23 ms. The masking is calculated on the strongest sound and the weak parts will drown in quantization noise. This is perceived as a noise-echo by the ear. Layer III addresses this problem specifically.

Who is using MPEG-1 AUDIO?

Philips uses MPEG for their new digital video CD's. They say they will start shipping movies and music videos on CD's for their CD-I player by the end of this year. MPEG is accepted by

Eureka-147. That means that when digital radio broadcasts starts in Europe a couple of years from now, you will receive MPEG coded audio.

The IUMA (Internet Underground Music Archive) holds many audio clips in MPEG compressed format, but you might need to configure your WWW browser. IUMA, has been founded to provide a world wide audience to otherwise obscure and unavailable bands and artists.

Which sampling frequencies are used ?

You can have 48 kHz, (used in professional sound equipment), 44.1 kHz(used in consumer

equipment like CD-audio) or 32 kHz (used in some communications equipment).

How many audio channels?

MPEG I allows for two audio channels. These can be either single(mono) dual (two mono

channels), stereo or joint stereo (intensity stereo or m/s-stereo). In normal (l/r) stereo one channel carries the left audio signal and one channel carries the right audio signal. In m/s stereo one channel carries the sum signal (l+r) and the other the difference (l-r) signal. In intensity stereo the high frequency part of the signal (above 2 kHz) is combined. The stereo image is preserved but only the temporal envelope is transmitted.In addition MPEG allows for pre-emphasis, copyright marks and original/copy marks. MPEG II allows for several channels in the same stream.

Where can I get more details about MPEG audio ?

There is no description of the coder in the specs. The specs describes in great detail the bitstream and suggests psychoacoustic models.

A good summary on MPEG audio compression can be found in the summer 1995 edition of IEEE Multimedia magazine ( A Tutorial on MPEG/Audio Compression, Davis Pan, p60-74)