Next: Discussion Up: Video Representation and Compression Previous: Single Image Compression

Exploiting Temporal Redundancy

Considering a movie as a sequence of single, independent images, leaves us without the opportunity to exploit the temporal redundancy: Often there are small changes from frame to frame within a video sequence. The background may be fixed while an object is moving in front of it, or the camera may sweep over a scene, shifting the entire view in one direction.

Standardized compression algorithms exists, taking advantage of similarities between nearby frames. The algorithms typically divide a frame in blocks of 8 8 pixels, and encode each block using discrete cosine transform (DCT). To take advantage of the temporal redundancy, the pixel values in a block may be predicted based on blocks in nearby frames. When such prediction is used, the block is represented not by the actual pixel values, but rather by the differences from the matching pixel values in the frame used for prediction.

To make prediction better, motion compensation is often used: A displacement vector may be associated with a block, describing how the block has moved relatively to the frame used for prediction. The vector should point to the block giving optimal prediction. The task of finding the optimal block when coding, is computationally expensive, and is typically left out when using software coders.

ITU-T Recommendations H.261 and H.263

ITU-T, the Telecommunication Standardization Sector of International Telecommunication Union (ITU), defines two standards (called ``recommendations'' in ITU-terminology) for transferring video and audio over digital lines. H.261 [20], finished in 1990, is designed for ISDN-lines or other media with transfer rates being multiple of 64 kbit per second. H.263 [21], currently a draft standard, is targeted at lines with lower bitrates.

H.261

H.261 supports two resolutions: Common Interchange Format (CIF) at 352 288 pixels, and Quarter CIF (QCIF) at 176 144 pixels. The luminance color component is coded at these sizes, while the chrominance components are reduced to half the size in both directions.

Frames for the three components are partitioned in blocks of 8 8 pixels, each of which are transformed, quantized and Huffman-coded separately. A macroblock is defined as four neighboring luminance blocks, and one block from each of the chrominance components, making up a 16 16 sub-image.

Two types of frames are defined, intra coded frames and inter coded frames. Intra coded frames are coded as stand-alone frames, while inter coded frames use prediction errors with respect to the previous frame. The coded blocks of inter coded frames may include motion compensation, in which case a motion vector is associated with each macroblock. The motion vector allows specification of a displacement of up to 15 pixels in all directions. The sender may decide not to send blocks that haven't changed since the previous frame.

H.263

H.263 works much like H.261, but there are several extensions, and some modifications. In addition to the two resolutions defined for H.261, H.263 allows the following: 16CIF at 1408 1152, 4CIF at 704 576, and sub-QCIF at 128 96 pixels.

Extensions to H.261 include ``PB-frames mode'', where two frames are coded as one unit. The latter frame is coded as an intra frame, while the former frame is coded in inter mode, possibly using bidirectional prediction between the previously seen frame, and the intra coded frame of the same unit.

Another extension is the use of unrestricted motion vectors, where motion vectors are allowed to point outside the frame. Edge pixels are used for prediction of the non-existing pixels. In H.263, motion vectors use half pixel prediction, instead of integer pixel prediction.

For the coding step, H.263 allows using arithmetic coding instead of the variable length coding used in H.261.

MPEG

The MPEG (Moving Picture coding Experts Group) standards specify coding of video and audio streams, and how synchronization between them is supposed to be done. At 1.2 Mbits per second, 30 Hz and a resolution of 352 240, the quality of an MPEG stream is comparable to VHS video [22]. The standardization effort was initiated in 1988, run by ``Joint ISO/IEC Technical Committee (JTC 1) on Information Technology''. The standards are said to be generic, in that they specify the format of the compressed stream, rather than the method by which the data are supposed to be coded.

MPEG defines three different types of frames [23], as illustrated in figure 2.6. Note that the standard does not specify the frame type sequence, it is left to the encoding application.

Figure 2.6: The relationship between frame types.

Intraframes, or I-frames, defines the start of a group of frames. I-frames are coded as stand-alone images, using a method resembling the one described for JPEG in section 2.3.1 on page .

A group of frames may contain predicted frames, called P-frames. These are predicted from the closest, previous I- or P-frame, with the help of motion compensation vectors. The motion vectors are associated with macroblocks of 16 16 pixels.

Between the I- and P-frames, there may be zero or more bidirectionally interpolated frames, or B-frames. These are interpolated between the nearest I- or P-frames. Since the interpolation is bidirectional, the decoder needs to see into the future. Macroblocks within a B-frame can be coded in several ways [22]:

Intra coding: No motion compensation.
Forward prediction: The previous I- or P-frame is used as a reference.
Backward prediction: The next I- or P-frame is used as a reference.
Bidirectional prediction: Two reference pictures are used, the previous and next I- or P-frame.

Originally, three versions of the standard were planned for different bitrates (1.5, 10 and 40 Mb/s). These were named MPEG-1, -2 and -3 accordingly [24]. Later MPEG-4 was initiated for development, suitable for lower bitrates.

MPEG-1 defines a ``Constrained Parameter Set'', describing the minimal requirements:

table476
Table 2.1: The Constrained Parameter Set of MPEG-1.

The maximum frame size is 4096 4096.

MPEG-2 offers extended audio-capabilities compared to MPEG-1, including more audio channels, and more sample rates.

MPEG-3 no longer exists. It was developed in parallel with MPEG-2 to support High Definition television (HDTV). As MPEG-2 came to cover what MPEG-3 was supposed to cover, further development was shut down in 1992.

MPEG-4 is the ``very low bitrate''-version of MPEG, suitable for bitrates lower than 64 kb/s. It is scheduled to result in a draft specification in 1997 [19].

Next: Discussion Up: Video Representation and Compression Previous: Single Image Compression

Sverre H. Huseby
Sun Feb 2 15:54:02 MET 1997