Codec and Container Overview
Containers
Video cannot generally be distributed as a raw stream of bytes,
compressed with its codec. It needs to be encapsulated into a
file or stream that allows it to be fast-forwarded, rewinded, and
synchronized with the audio and/or subtitle track(s). To do all
this organization, several types of "Containers" have been
developed. As this is the outermost layer of the file that the
user will interact with, these are sometimes called "Formats", however
this is generally not accurate because to know the format you would
need to know the type of the audio, video, and subtitle codecs
contained within the file, and most of these containers are capable of
supporting more than one of each of these types.
http://en.wikipedia.org/wiki/Container_format_(digital)
http://en.wikipedia.org/wiki/Comparison_of_container_formats
http://es.wikipedia.org/wiki/Formato_contenedor
MPEG-PS
- Motion Picture Experts Group - Program Stream, is used for storing
video data for use on DVDs and Blue-Ray discs. It is compatible
with MPEG 1, 2, and 4 video, several types of MPEG audio, and
subtitles. http://en.wikipedia.org/wiki/MPEG_program_stream
http://es.wikipedia.org/wiki/Program_Stream
MPEG-TS - Motion Picture
Experts Group - Transport Stream, is used for transmitting video data,
especially in cases where some of it may be lost in transit. In
addition to raw streams of data, it can contain Program Streams. http://en.wikipedia.org/wiki/MPEG_transport_stream
http://es.wikipedia.org/wiki/Transport_Stream
Microsoft
Audio Video Interleave (AVI)
- Despite some technical limitations, Microsoft's Audio Video
Interleave (AVI) format has been very common for videos available for
download off the internet, especially those encoded with DivX or Xvid
codecs. Its popularity is starting to fade as use of MKV and MP4
picks up, but it is still common to find videos in this
container. http://en.wikipedia.org/wiki/Audio_Video_Interleave
http://es.wikipedia.org/wiki/Avi
Quick Time (MOV) - This is the
container
that is used with Apple's QuickTime movies, it has been superseded in
Apple's iTunes and QuickTime by the MP4 container. http://en.wikipedia.org/wiki/.mov
Windows Media Video (WMV aka
ASF) - This is
Microsoft's main container format, capable of supporting all the
microsoft codecs as well as some others. Generally the .wmv
extension implies that one of the WMV video codecs is being used with
the WMA audio codec. If other codecs are being used, Microsoft's
guidance is to use the .asf extension. http://en.wikipedia.org/wiki/Windows_Media_Video
http://es.wikipedia.org/wiki/Wmv
Flash Video (FLV) - Flash
Video is common on
websites that show video clips (like YouTube). There are several
video codecs that are compatible with this, with the most common being
H.263. The most recent versions of the Flash player are also able
to work with MP4 or WebM, so this container is no longer required to
work with the Flash player. http://en.wikipedia.org/wiki/Flash_Video
http://es.wikipedia.org/wiki/Flv
Ogg - This is the first major
open source container that was in major use. It was primarily
compatible with the Theora video codec and Vorbis audio. This has
generally been superseded by the open source MKV container. However,
this container is better for streaming audio and video than MKV, so it
is still frequently used for that. http://en.wikipedia.org/wiki/Ogg
http://es.wikipedia.org/wiki/Ogg
MPEG-4 (MP4 aka M4V) - This is
the
standard file format that the MPEG group came up with for MPEG 4 video,
although it is not limited to MPEG 4 video. This container is
extremely similar to Apple's Quick Time MOV container, and is used by
iTunes for videos. The most typical format for this container is
to contain MPEG4-AVC/H.264 video and AAC audio. It is supported
by the Flash player in web browsers, and by the Internet Explorer and
Safari web browsers for HTML 5 video where it uses progressive
downloading so the video can start playing before it is completely
downloaded. http://en.wikipedia.org/wiki/MPEG-4_Part_14
http://es.wikipedia.org/wiki/MPEG-4_Parte_14
Matroska (MKV)
- This is an open source, multi-purpose container format. This
format
can store almost any type of video, audio, and subtitles. It can
also
store multiple different tracks of any of those, enabling a movie to
have multiple different languages of audio built in to it, or something
like a director's commentary track. http://en.wikipedia.org/wiki/Matroska
http://es.wikipedia.org/wiki/Mkv
WebM - This is simply a
trimmed down version of MKV that only supports the VP8 video codec and
the Vorbis audio codec. This was created by Google for
royalty-free
use on the internet in web video. It is supported in the HTML 5
video
standard by Firefox and Chrome, with support in Internet Explorer
available as a plugin. It is also supported in web browsers by
the
Flash player. http://en.wikipedia.org/wiki/WebM
http://es.wikipedia.org/wiki/WebM
Video Compression and Codecs
Video is HUGE. If we have a 1080p (1920x1080) stream of video
using YUV4:2:2 (16 bits/pixel = 2 bytes/pixel), at 30fps (Frames Per
Second), that would be 124MB/sec (1920*1080*2*30) or 7.5GB/min or
almost 0.5TB/hr! We wouldn't get too many movies if we needed a
1TB hard disk for ever two hour long movie that we wanted to
watch. Fortunately, video is highly compressible. Video
compression is done using coders and decoders or for short CODECs
(COder/DECoder).
There are many different types of video compression codecs. Some
of the simplest take standard image compression (like JPEG) and apply
it to each frame of video. This by itself can drastically cut
down on the amount of storage space that is needed. When we did
the section on photos, we saw that with decent JPEG compression, we
could compress a 30MB image down into about 3MB. If we applied
that same ratio to our raw video (1080p, YUV4:2:2, 30fps), we would get
about 12MB/sec, 750MB/min, 50GB/hr. Just for comparison 12MB/sec
is approximately the speed that a typical hard disk can run at, so that
would at least be possible to store on a normal computer. If
you're working with video that is smaller (480p, YUV4:2:0, 30fps), this
would be down around 1.8MB/sec, very doable. In fact, because this is
relatively easy to do, many digital cameras that also record video use
this method to compress their video.
However, video has another advantage that allows for vastly higher
compression ratios. The trick is that almost all the frames of
video are very similar to the one before it, and the one after
it. The exceptions to this are when the camera is panning very
quickly, or there is a cut in the video and a different scene is
shown. To utilize this all modern codecs utilize interframe
compression. For instance, a frame will tell the decoder that
"this group of pixels is the same as the previous frame", or "this
group is the same, except it is a little bit darker", or "this group is
the same as the group that was 10 pixels above it in the previous
frame". Sometimes there is enough changes to the frame (or a cut
in the video) that it isn't efficient to try to explain the new frame
in terms of the old frame. In that case, the codec just forgets
about the previous frame all together, and grabs a whole new frame
(probably with something similar to JPEG compression) that has values
for all the pixels. Then the next frame can start referencing
this again. This brand-new frame is called a "Keyframe", because
it is essential to making the stream of frames work smoothly.
There are a lot of complicated methods used for doing interframe
compression and using keyframes. We won't go into any of the
specifics here, but it is the details of how a codec does this that
separates each of the modern codecs from its peers, and these details
are very complicated. Just to give one example, the list of
patents that are used in the (fairly old) MPEG-2 standard for
interframe compression is several pages long.
http://en.wikipedia.org/wiki/Video_compression
http://en.wikipedia.org/wiki/Video_compression_picture_types
http://en.wikipedia.org/wiki/Video_codec
http://es.wikipedia.org/wiki/Códec_de_vídeo
Raw - Not compressed with a
codec. There are still various types of raw formatting, the most
common probably being DV Video, which often comes from digital
camcorders.
MJPEG - Uses the JPEG
format for photographs to compress every frame of video. This
results in a very low compression ratio (large files), but is
relatively easy to do. Many digital cameras that also capture
video use this to encode their video. http://en.wikipedia.org/wiki/Motion_JPEG
http://es.wikipedia.org/wiki/MJPEG
MPEG 2 (aka H.262) - This is
the successor to the Motion Picture Experts Group (MPEG) first video
standard, MPEG 1. When DVDs were first created, this was one of
the most advanced codecs, and was selected for use as the only codec
universally supported for DVDs. Since then newer and better
codecs have been created but the DVD standard is not able to be
changed, so DVDs still use this. http://en.wikipedia.org/wiki/MPEG-2
http://es.wikipedia.org/wiki/MPEG-2
VP3 (aka Theora) - This is an
older codec but is significant in that it was the first video codec to
be released under an open source license, and was named Theora.
This codec is free to implement for anyone in software or
hardware. However, as it is older the performance of the codec
doesn't match some of the modern equivalent. Theora's
performance is substantially better than MPEG 2, but a touch worse than
most MPEG 4 Part 2 implementations. http://en.wikipedia.org/wiki/Theora
http://es.wikipedia.org/wiki/Theora
H.263 (aka Flash4) - This is was the most widely
used codec for flash video that can be watched in browsers on
website. This is currently being superseded by MPEG 4 AVC/H.264
in flash. Also, HTML 5 is starting to replace flash video
altogether, and it uses H.264 or WebM/VP8
MPEG 4 Part 2 (aka DivX, XVID, MS
MPEG4v3) - This was the first major use of the MPEG 4
standard. One of the original implementations was done by
Microsoft and
is called Microsoft MPEG 4 Version 3, however it wasn't actually
compatible with the baseline MPEG 4 Part 2. However, that didn't
stop a project called DivX from copying this codec and giving it away
freely. From there a group created an open source version of the same
thing called Xvid. Thus these popular encoders MS MPEG 4v3, DivX
and Xvid are all MPEG 4 Part 2 (but not necessarily compliant with all
the standard). Between DivX and Xvid, many of the
videos that were available for download on the internet were at one
time (and a
good number still are) encoded in MPEG 4 Part 2. The reason for
its popularity is that it was a large step up (approximately 40%
smaller files) from MPEG 2. http://en.wikipedia.org/wiki/MPEG-4_Part_2
http://es.wikipedia.org/wiki/MPEG-4_Parte_2
MPEG 4 Part 10 (aka MPEG 4 - AVC,
H.264) - This codec is currently widely considered the
highest-performance codec available. It will generate file sizes
that are approximately 50% smaller than MPEG 4 Part 2 codecs.
This codec is now widely available as it is distributed with iTunes on
Mac and comes built in with Windows 7. It is also available for
playing video on the web with the current version of the Flash
plugin. It is also being supported with HTML 5 video in Internet
Explorer (version 9) and Safari. There are also plugins to get it
working in Firefox and Chrome. In addition to its consumer use,
there is a
lossless option to H.264, which is useful for storing video during
editing or for archival purposes. http://en.wikipedia.org/wiki/H.264/MPEG-4_AVC
http://es.wikipedia.org/wiki/H.264/MPEG-4_AVC
VP8 - VP8 was released by
Google (which bought the company that created it) into the public
domain, for use as a royalty-free codec. It is available by
itself for use inside many different formats/containers, but is also
the only video codec supported by the WebM video format. This
format is almost as powerful as H.264, and as its software gets
improved is getting closer to matching it. This codec is
available natively for use in
HTML 5 video in Firefox, Chrome, and Opera web browsers, and can be run
in Internet Explorer with a plugin. It will also be available in
the next major release of the Flash plugin. http://en.wikipedia.org/wiki/VP8
http://es.wikipedia.org/wiki/VP8
VC-1 (aka WMV3) - This is
Microsofts latest codec in the Windows Media series. Its
performance falls somewhere between MPEG 4 Part 2 and MPEG 4 Part
10. It is generally only available on windows platforms. http://en.wikipedia.org/wiki/VC-1
http://es.wikipedia.org/wiki/VC-1
Dirac (aka VC-2) - Dirac and
its most popular implementation, Schroedinger, is a codec developed
with a completely different, very high performance compression
technology. Unfortunately this technology is not completely
finished and takes a huge amount of computing power to encode and
decode video in it. Because of these limitations, it is only used
as a format for storing video during editing and for archival purposes,
and in studios with powerful computers. http://en.wikipedia.org/wiki/Dirac_(video_compression_format)
http://es.wikipedia.org/wiki/Dirac_(códec)
FFV 1 - This is a codec that
only compresses losslessly, for use during editing and
production. It was created as part of the ffmpeg/libav
project. It is generally faster and has better compression than
all the lossless codecs except x264's lossless encoders.
Two Pass Encoding
The amount of compression that can happen to video depends very much
what is going on in the video. If there is a lot of motion (a lot
of
pixels change between frames) like a fight scene in a movie then it
will be tough for the encoder to store all that data in a small
area.
Or, if there isn't a lot of motion for instance when the people in the
video are talking, and only their lips are moving, it is easy for
modern video encoders to just update the few pixels that are changing
and to leave everything else alone. All modern video encoders
allow
for some amount of Variable Bit-Rate encoding, where, as was discussed
in the audio section, certain parts of a clip can use more bits and
other parts less, making it easier for the encoder to create something
that looks good, while keeping the total file size the same. The
trick
with video however is that it can't load large portions of the video
into the computers memory (like it can with audio) to figure out where
the hard/easy portions are. To facilitate this, two pass encoding
has
been developed.
Two pass encoding runs the encoder over the video clip twice.
On
the first pass, no usable file is created, but the encoder records
(into a separate log file) which spots in the video could use more bits
and which could use less. Armed with this information, the
encoder is
run for a second pass at the clip. This time it knows ahead of
time
where it can skimp on the bits and where it needs to use extra.
Obviously, this won't work on live video, as the entire clip (or at
least a large portion of it) isn't available to look through before it
needs to start sending out the final, encoded, version. However,
for
video files that are created to playback or distribute later, this is a
much more efficient way of working.
Audio in Videos
The audio codecs used are many of the same ones
that were covered in the last unit on audio editing, so here we are
only going to list some of their most common uses in video. For
more on these codecs, see the audio section.
AC3 - Most common in DVD/BluRay
AAC - Apple Online Videos and BluRay
MP3 - Often mixed with XVID and AVI for amateur video
Vorbis - Often used with Theora or VP8/WebM video
FLAC - The most standard lossless formate used in video archiving
WMA - Used in Microsoft video files