Considering a movie as a sequence of single, independent images, leaves us without the opportunity to exploit the temporal redundancy: Often there are small changes from frame to frame within a video sequence. The background may be fixed while an object is moving in front of it, or the camera may sweep over a scene, shifting the entire view in one direction.
Standardized compression algorithms exists, taking advantage of
similarities between nearby frames. The algorithms typically divide a
frame in blocks of 8
8 pixels, and encode each block using
discrete cosine transform (DCT). To take advantage of the temporal
redundancy, the pixel values in a block may be predicted based on
blocks in nearby frames. When such prediction is used, the block is
represented not by the actual pixel values, but rather by the
differences from the matching pixel values in the frame used for
prediction.
To make prediction better, motion compensation is often used: A displacement vector may be associated with a block, describing how the block has moved relatively to the frame used for prediction. The vector should point to the block giving optimal prediction. The task of finding the optimal block when coding, is computationally expensive, and is typically left out when using software coders.
ITU-T
, the
Telecommunication Standardization Sector of International
Telecommunication Union (ITU), defines two standards (called
``recommendations'' in ITU-terminology) for transferring video and audio
over digital lines. H.261 [20], finished in 1990, is designed
for ISDN-lines or other media with transfer rates being multiple of 64
kbit per second. H.263 [21], currently a draft standard, is
targeted at lines with lower bitrates.
H.261 supports two resolutions: Common Interchange Format (CIF) at
352
288 pixels, and Quarter CIF (QCIF) at 176
144 pixels. The
luminance color component is coded at these sizes, while the
chrominance components are reduced to half the size in both
directions.
Frames for the three components are partitioned in blocks of
8
8 pixels, each of which are transformed, quantized and
Huffman-coded separately. A macroblock is defined as four
neighboring luminance blocks, and one block from each of the
chrominance components, making up a 16
16 sub-image.
Two types of frames are defined, intra coded frames and inter coded frames. Intra coded frames are coded as stand-alone frames, while inter coded frames use prediction errors with respect to the previous frame. The coded blocks of inter coded frames may include motion compensation, in which case a motion vector is associated with each macroblock. The motion vector allows specification of a displacement of up to 15 pixels in all directions. The sender may decide not to send blocks that haven't changed since the previous frame.
H.263 works much like H.261, but there are several extensions, and
some modifications. In addition to the two resolutions defined for
H.261, H.263 allows the following: 16CIF at 1408
1152, 4CIF at
704
576, and sub-QCIF at 128
96 pixels.
Extensions to H.261 include ``PB-frames mode'', where two frames are coded as one unit. The latter frame is coded as an intra frame, while the former frame is coded in inter mode, possibly using bidirectional prediction between the previously seen frame, and the intra coded frame of the same unit.
Another extension is the use of unrestricted motion vectors, where motion vectors are allowed to point outside the frame. Edge pixels are used for prediction of the non-existing pixels. In H.263, motion vectors use half pixel prediction, instead of integer pixel prediction.
For the coding step, H.263 allows using arithmetic coding instead of the variable length coding used in H.261.
The MPEG (Moving Picture coding Experts Group) standards specify
coding of video and audio streams, and how synchronization between
them is supposed to be done. At 1.2 Mbits per second, 30 Hz and a
resolution of 352
240, the quality of an MPEG stream is comparable
to VHS video [22]. The standardization effort was initiated in
1988, run by ``Joint ISO/IEC Technical Committee (JTC 1) on
Information Technology''. The standards are said to be generic,
in that they specify the format of the compressed stream, rather than
the method by which the data are supposed to be coded.
MPEG defines three different types of frames [23], as illustrated in figure 2.6. Note that the standard does not specify the frame type sequence, it is left to the encoding application.
Figure 2.6: The relationship between frame types.
Intraframes, or I-frames, defines the start of a group of
frames. I-frames are coded as stand-alone images, using a method
resembling the one described for JPEG in section 2.3.1 on
page
.
A group of frames may contain predicted frames, called P-frames. These
are predicted from the closest, previous I- or P-frame, with the help
of motion compensation vectors. The motion vectors are associated with
macroblocks of 16
16 pixels.
Between the I- and P-frames, there may be zero or more bidirectionally interpolated frames, or B-frames. These are interpolated between the nearest I- or P-frames. Since the interpolation is bidirectional, the decoder needs to see into the future. Macroblocks within a B-frame can be coded in several ways [22]:
Originally, three versions of the standard were planned for different bitrates (1.5, 10 and 40 Mb/s). These were named MPEG-1, -2 and -3 accordingly [24]. Later MPEG-4 was initiated for development, suitable for lower bitrates.
MPEG-1 defines a ``Constrained Parameter Set'', describing the minimal requirements:
Table 2.1: The Constrained Parameter Set of MPEG-1.
The maximum frame size is 4096
4096.
MPEG-2 offers extended audio-capabilities compared to MPEG-1, including more audio channels, and more sample rates.
MPEG-3 no longer exists. It was developed in parallel with MPEG-2 to support High Definition television (HDTV). As MPEG-2 came to cover what MPEG-3 was supposed to cover, further development was shut down in 1992.
MPEG-4 is the ``very low bitrate''-version of MPEG, suitable for bitrates lower than 64 kb/s. It is scheduled to result in a draft specification in 1997 [19].