Design and validate MPEG-4/JPEG codec IP using SoC platform

As the silicon process continues to shrink in geometry, chip designers can actually integrate all system functions on a single chip. Many chip manufacturers and designers believe that the high integration of SoC is the versatile solution to the problem in the face of customers' demand for versatility, low power consumption, low cost and miniaturization. Unfortunately, the productivity of design cannot keep up with the speed of Moore's Law.

As shown in Figure 1, for the general design, the increased complexity caused by the increased use of IP, embedded processors, memory and logic gates will lead to an increase in design and verification manpower. As a result, the reusability of design becomes an effective way to advance design productivity.

Figure 1 shows

This article refers to the address: http://


Although IP-based reusable design methods have been popularized for nearly a decade, most design approaches still use module-based stacking in large numbers and require users to integrate. When using pre-designed modules, engineers must understand how the modules work and how they integrate with other modules and work properly in the design. Third-party IP increases the difficulty of integration, even those that are commercially successful and stream-tested IP often cause problems when used.

The use of pre-designed modules does not guarantee a single-chip success. In the past few years, several companies have attempted to change the design approach by fully integrating internal IP or virtual components and software into a common architecture that can create common functionality.

Form a so-called platform-based design. The platform design approach is a very effective strategy for product complexity and timeliness. Derived designs can also be quickly completed by adding IP. In addition, the integrated architecture will reduce the uncertainty of verification and, therefore, the strength and risk required for design.

Beyond the technical challenges of SoC, there are some issues that are rarely mentioned, but they are important for platform-based SoC design. One of the challenges is the integration of the SoC design team with a number of subgroups such as IP providers, software tool providers, EDA tool providers, verification teams, system design teams and foundries. Unfortunately, most platform SoC providers are generally only familiar with one or two of the above mentioned items. Lack of communication will severely limit the smooth execution of SoC design steps. The ideal SoC design team and subgroups should be closely related to each other so that problems in the design can be quickly resolved.

Faraday Technology is an old-fashioned design service company that brings together many of these sub-groups within the company. Accumulates a large number of virtual components, including ARM V4 instruction-compatible 32-bit embedded processors and high-speed I/O. Figure 2 provides an A320 platform that accelerates design time by providing a pre-integrated architecture that allows for versatility implementation. How do we use this SoC platform to design our own MPEG4 codec IP?

The A320 platform accelerates design time by providing a pre-integrated architecture that allows for the emergence of versatility


The MPEG-4/JPEG codec IP designed by Faraday Technology must first conform to the AHB timing to accelerate multimedia video. Full hardware accelerators including Motion Estimation, Discrete Cosine Transform / Inverse Discrete Cosine Transform (DCT/IDCT), Quantization / Inverse Quantization, and Motion Evaluation.

The ARM CPU-FA526 from Zhiyuan Technology is a 32-bit embedded CPU, which is a product developed and legally developed by Zhiyuan Technology. The CPU uses a Harvard architecture and has a six-stage pipeline that is compatible with the ARM V4 architecture. FA526 uses 16K/16K bytes I-cache/D-cache and 8K/8K bytes instruction / data scratchpads. JATG ICE interface makes programming and debugging very convenient. High performance and low power consumption make this CPU use a wide range of fields.

The FA526 uses the AHB slave interface to control the codec. The codec's control registers are initialized, and motion estimation can be done by the codec itself for the entire 16x16 or 8x8 block calculation task. Discrete cosine transform/quantization, inverse discrete cosine transform/inverse quantization, AC/DC prediction, Zigzag Scan and variable length codec (VLC/VLD) calculation tasks can also be done by the codec itself. .

It is also necessary to design an internal DMA controller to perform the task of moving data in the system memory and the local memory of the MPEG4/JPEG codec. The DMA controller includes an AHB master interface and an AHB slave interface. The AHB master interface allows the DMA controller to access data from the AHB bus. The AHB slave interface is used by the AHB bus to program the DMA controller control registers. We then determined the specifications of the MPEG4/JPEG codec IP as follows:

Full hardware MPEG4/JPEG codec.

Compliant with MPEG-4 (ISO/IEC 14496-2) simple profile L0 ~ L3 standard - Supports standard resolution (sub QCIF, QCIF, CIF, VGA and 4CIF) and non-standard on 16-pixel steps
- Maximum support for D1 @ 30 fps, XGA @ 15fps, SXGA @ 10fps
- Half-duplex half of the frame rate for full-duplex operation - Supports MPEG4 short header format (H.263 baseline)
- Search range of motion estimation: -16 ~ +15.5 (optional to –32 ~ +31.5) In the case of half pixel precision - support 4MV
- Supports Constant Bit Rate and Variable Bit Rate Control - Supports two error recovery tools below - Encoding: re-synchronization marker and header extension code
- Decoding: re-synchronization marker, header extension code, data partition and RVLC

Compliant with JPEG (ISO/IEC 10918-1) baseline standard - 4 user-defined Huffman tables (2AC and 2DC)
- 4 programmable quantization tables - interlaced and progressive - YCbCr 4:4:4, 4:2:2 and 4:2:

0 format - image size up to 64kx64k
- 60fps resolution at 640x480

MPEG4 image compression is to dynamically estimate the acquired image to obtain its motion vector (Motion Vector) and absolute difference sum (SAD). When the absolute difference is too large, the original image is sent to the discrete cosine transform (Discrete Cosine). Transform), on the other hand, the difference between the current image and the previous restored image is sent to the discrete cosine transform. Then, the data obtained by the discrete cosine transform is quantized, and the quantized data is used as an AC/DC predictor, and then the DC-predicted data is transmitted to the variable length encoder (Variable). Length Code), the original image data is compressed into a basic bit stream (Base Layer Bitstream), the client can receive the basic bit stream, the decoder can be decompressed to restore the video, on the other hand, In addition to the AC/DC predictor, the quantized data will also be passed to the Inverse Quantization, and the inverse quantized data will be passed to the inverse discrete cosine transform, and then based on the previous dynamic estimation. The result of the calculation is Motion Compensation to reconstruct an image as a reference image for the next image. After confirming the specifications, define the module structure of the codec as shown in Figure 3 below. We will explain some of these modules.

MPEG4

Motion Estimation (ME): The motion estimation unit can perform motion estimation on the entire search window based on a fast search algorithm. The simple trick is to estimate the current frame (Frame) 裡 a certain N × N block (Macro block) and the previous frame (Reference block) 裡 the most similar block displacement vector, called the motion vector. The range to be searched for in this block extends a fixed value w around the block, called the Search Area.

DCT/IDCT: The core of the MPEG4 algorithm is an operation called discrete cosine transform (DCT). The basic principle of DCT is to take the square of the block of pixels and remove the redundant information that the observer does not notice. To understand compressed data, an inverse discrete cosine (IDCT) operation is also required. The DCT/IDCT unit is an operation that bears discrete cosine transforms and inverse discrete cosines. The IDCT shares the same hardware resources as the DCT, and the resulting results are compatible with the IEEE 1180-1990 specification and are sent to the motion estimation unit in the decoded state. The result produced by the DCT is sent to the quantization unit in the encoded state.

Quantization (Q) / Inverse Quantization (IQ): Quantization is mainly used to match the results of DCT to reduce the amount of data to increase the compression ratio of the subsequent VLC. The inverse quantization plus the DCT decodes the encoded image data, and uses the result calculated by combining the motion estimation to perform dynamic compensation to reconstruct an image as a reference image of the next image. The quantization/inversion quantization unit supports the H.263/MPEG/JPEG quantization method. The result of the quantization is sent to the AC/DCP unit in the encoded state. The results produced by the IQ are sent to the IDCT unit in the decoded state.

AC/DCP (AC/DC Prediction): The purpose of AC/DC prediction is to refer to the quantized value of other blocks around the block (Macro Block), predict the reference block, and reduce the amount of data to increase the VLC. Compression ratio. The AC/DC prediction unit supports MPEG-4 AC/DC prediction and JPEG DC prediction methods. The results produced by the AC/DC prediction unit are sent to the Zigzag Scan unit in the encoded state. The result produced by the inverse AC/DC prediction unit is sent to the inverse quantization unit in the decoded state.

Zigzag Scan: After quantization, almost every part of the 8 x 8 small square except the upper left corner will become 0. In order to encode the number of consecutive zeros, a zigzag scan is performed on each of the 8 x 8 small squares, making the two-dimensional one-dimensional. The zigzag scanning unit supports all MPEG-4 and JPEG sawtooth scanning methods. The result produced by the sawtooth scanning unit is sent to the VLC unit in the encoded state. The result produced by the anti-aliased scanning unit is sent to the AC/DC prediction unit in the decoded state.

Variable length codec (Variable Length Coding/Dec

Oding, VLC/VLD): When the data is subjected to motion estimation (ME), DCT, quantization and AC/DC prediction, the VLC is streamed through appropriate coding, and then transmitted through the network to be transmitted by the decoder at the user end. Code into MPEG4 images. Variable length coder (VLC) is a function similar to Huffman coding. Frequently occurring values ​​use shorter code values, and fewer occurrences of long code values ​​來 achieve better compression ratios. A codebook will be created first to match the lookup table action. In addition, in order to be afraid that some values ​​are not searchable in the codebook, the fixed-length coding method is proposed to remove these additional thresholds. We performed a data scanning operation before looking up the table. The scanning can be divided into three ways: Horizontal Scan, Vertical Scan and ZigZag Scan. The VLC/VLD unit supports MPEG-4 fixed VLC and JPEG client-defined Huffman coding methods. The result produced by the VLC unit is the last compressed bit stream in the encoded state. The result produced by the VLD unit is sent to the sawtooth scanning unit in the decoded state.

Motion Compensation (MC): Some assumptions have been made while doing motion estimation, assuming that objects in the picture do not magnify or shrink, deform or rotate, suddenly appear or disappear, but in the real world these assumptions are not Constantly established. If only motion estimation is performed, the reconstructed picture and the original picture will produce errors, and will increase as the number of pictures increases. In order to compensate for the lack of motion estimation, a motion compensation unit must be designed. In the encoded state, the motion compensation unit subtracts the original block from the block generated by the interpolation, and the remaining block is sent to the DCT unit. In the decoded state, the motion compensation unit adds the block generated by the IDCT unit to the interpolated block to generate the reconstructed block.

Figure 4 shows a SoC system architecture diagram that includes the FA526 CPU, MPEG4/JPEG codec, video input and output interfaces, and the system memory controller required for the MPEG4/JPEG codec. The video capture interface performs the task of passing video data to system memory.

SoC system structure diagram


The timing of the MPEG4/JPEG module is then verified using the virtual platform environment (VPE) of Faraday Technology. As shown in Figure 5, Faraday's VPE system is a general-purpose SoC integrated verification environment based on Advanced Microcontroller Bus Architecture (AMBA). Designers can use VPE and EDA simulators to verify the functionality of the IP and the integrity of the SoC chip. It is integrated

Zhiyuan Technology CPU simulation model AMBA bus device simulation model (master / slave / arbiter / decoder ... )
Faraday technology starcells simulation model (sdmc, gpio, smc ....)
Other related device simulation models (sdram/rom/ I/O model)

VPE integrated verification environment


Designers can add their own designs as needed, such as the MPEG4 codec hooked up on AHB. Each functional module can be independently simulated in the VPE. Zhiyuan Technology can provide simulation models of various functional modules of AMBA on VPE, and designers can easily build and test an AMBA-based SoC system. The simulation model of its VPE includes the following:

Behavioral model
-AHB Master (Master)
-AHB Slave (Slave)
-AHB monitor (Monitor)
-APB Slave (Slave)

RTL Level Model - Arbiter (Arbiter)
-Decoder
- AHB-to-APB bridge with direct memory access (DMA) channel - AHB-to-APB bridge without direct memory access (DMA) channel

After the functional simulation is completed, we use the SoC design platform of the A320 shown in Figure 6 to do the FPGA hardware simulation. The A320 integrates the IP required to complete the design. The logic module realized by FPGA is connected with the A320 design platform through the AHB/APB bus connector, which can easily perform a series of actions such as function verification and debugging. Because the IP on the A320 chip is silicon verified, the design is guaranteed while ensuring the consistency from design to chip.

A320 - SoC design platform IP resources and structure

We have designed an FPGA development board that can be verified with the A320, as shown in Figure 7 below. The MPEG4/JPEG codec development board includes the Xilinx Virtex-II XC2V4000 BF957, video capture and A320 interface. He provides on-board FPGA (XC2V4000 BF957), SAA7113 video capture chip, CMOS sensor module, 16 LEDs for debugging information, 2 expansion buses and an AHB connector to connect to the A320 development board. We also created a document describing the definition of the Pin foot to make it easier for the user to understand the design principles.

MPEG4


We also provide the schematic to the user, as shown in Figure 8 below, the user can fully understand the direction of the signal flow.

Schematic block diagram


Our test environment is very convenient, as shown in Figure 9 below, using ARM's software development environment,

EDGE debugging development board provided by Faraday Technology or ARM with USB interface, insert the above MPEG4/JPEG codec FPGA development board into the A320 AHB and APB bus connector, power on the A320, and connect the PC to the A32 using the USB port. board. In MPEG4 or JPEG encoding mode, the AXD debugger is used to download image data and run firmware, and then the MPEG4/JPEG codec compresses image data. The compressed data stream needs to be stored as the final compressed file. In MPEG4 or JPEG decoding mode, the AXD debugger downloads the compressed data stream and runs the firmware, followed by the MPEG4/JPEG codec to decompress the data.

MPEG4


The MPEG4/JPEG codec that completes the FPGA verification can be combined with the IP on the A320 platform, plus other required IP, such as the common stream of AHB-PCI bridge, which is another platform (as shown in Figure 10 below) or one for The finished product of audio/video. Platformization speeds up design time due to a large number of pre-verified architectures. Platformization is also the use of IP multiplexing methods to design SoCs. The SoC design development platform is actually a comprehensive development system. The core chip integrates common IP and provides the most basic support for development. The development test board provides a hardware development environment, and VPE provides a software simulation environment. With the VPE platform, the development of SoC systems has become simple and efficient. A good verification and development platform enables SoC with MPEG4/JPEG codec to quickly complete design, verification and streaming. The A320 platform of Zhiyuan Technology is a good SoC verification and development platform.

MPEG4

LED Tube

LED Tube, Indoor LED Tube,Energy Saving LED Tube,T8 LED Tube

LED Downlight Co., Ltd. , http://www.satisledlight.com