Codec and Container Overview


Video cannot generally be distributed as a raw stream of bytes, compressed with its codec.  It needs to be encapsulated into a file or stream that allows it to be fast-forwarded, rewinded, and synchronized with the audio and/or subtitle track(s).  To do all this organization, several types of "Containers" have been developed.  As this is the outermost layer of the file that the user will interact with, these are sometimes called "Formats", however this is generally not accurate because to know the format you would need to know the type of the audio, video, and subtitle codecs contained within the file, and most of these containers are capable of supporting more than one of each of these types.

MPEG-PS - Motion Picture Experts Group - Program Stream, is used for storing video data for use on DVDs and Blue-Ray discs.  It is compatible with MPEG 1, 2, and 4 video, several types of MPEG audio, and subtitles.
MPEG-TS - Motion Picture Experts Group - Transport Stream, is used for transmitting video data, especially in cases where some of it may be lost in transit.  In addition to raw streams of data, it can contain Program Streams.
Microsoft Audio Video Interleave (AVI) - Despite some technical limitations, Microsoft's Audio Video Interleave (AVI) format has been very common for videos available for download off the internet, especially those encoded with DivX or Xvid codecs.  Its popularity is starting to fade as use of MKV and MP4 picks up, but it is still common to find videos in this container.
Quick Time (MOV) - This is the container that is used with Apple's QuickTime movies, it has been superseded in Apple's iTunes and QuickTime by the MP4 container.
Windows Media Video (WMV aka ASF) - This is Microsoft's main container format, capable of supporting all the microsoft codecs as well as some others.  Generally the .wmv extension implies that one of the WMV video codecs is being used with the WMA audio codec.  If other codecs are being used, Microsoft's guidance is to use the .asf extension.
Flash Video (FLV) - Flash Video is common on websites that show video clips (like YouTube).  There are several video codecs that are compatible with this, with the most common being H.263.  The most recent versions of the Flash player are also able to work with MP4 or WebM, so this container is no longer required to work with the Flash player.
Ogg - This is the first major open source container that was in major use.  It was primarily compatible with the Theora video codec and Vorbis audio.  This has generally been superseded by the open source MKV container. However, this container is better for streaming audio and video than MKV, so it is still frequently used for that.
MPEG-4 (MP4 aka M4V) - This is the standard file format that the MPEG group came up with for MPEG 4 video, although it is not limited to MPEG 4 video.  This container is extremely similar to Apple's Quick Time MOV container, and is used by iTunes for videos.  The most typical format for this container is to contain MPEG4-AVC/H.264 video and AAC audio.  It is supported by the Flash player in web browsers, and by the Internet Explorer and Safari web browsers for HTML 5 video where it uses progressive downloading so the video can start playing before it is completely downloaded.
Matroska (MKV) - This is an open source, multi-purpose container format.  This format can store almost any type of video, audio, and subtitles.  It can also store multiple different tracks of any of those, enabling a movie to have multiple different languages of audio built in to it, or something like a director's commentary track.
WebM - This is simply a trimmed down version of MKV that only supports the VP8 video codec and the Vorbis audio codec.  This was created by Google for royalty-free use on the internet in web video.  It is supported in the HTML 5 video standard by Firefox and Chrome, with support in Internet Explorer available as a plugin.  It is also supported in web browsers by the Flash player.

Video Compression and Codecs

Video is HUGE.  If we have a 1080p (1920x1080) stream of video using YUV4:2:2 (16 bits/pixel = 2 bytes/pixel), at 30fps (Frames Per Second), that would be 124MB/sec (1920*1080*2*30) or 7.5GB/min or almost 0.5TB/hr!  We wouldn't get too many movies if we needed a 1TB hard disk for ever two hour long movie that we wanted to watch.  Fortunately, video is highly compressible.  Video compression is done using coders and decoders or for short CODECs (COder/DECoder).

There are many different types of video compression codecs.  Some of the simplest take standard image compression (like JPEG) and apply it to each frame of video.  This by itself can drastically cut down on the amount of storage space that is needed.  When we did the section on photos, we saw that with decent JPEG compression, we could compress a 30MB image down into about 3MB.  If we applied that same ratio to our raw video (1080p, YUV4:2:2, 30fps), we would get about 12MB/sec, 750MB/min, 50GB/hr.  Just for comparison 12MB/sec is approximately the speed that a typical hard disk can run at, so that would at least be possible to store on a normal computer.  If you're working with video that is smaller (480p, YUV4:2:0, 30fps), this would be down around 1.8MB/sec, very doable. In fact, because this is relatively easy to do, many digital cameras that also record video use this method to compress their video. 

However, video has another advantage that allows for vastly higher compression ratios.  The trick is that almost all the frames of video are very similar to the one before it, and the one after it.  The exceptions to this are when the camera is panning very quickly, or there is a cut in the video and a different scene is shown.  To utilize this all modern codecs utilize interframe compression.  For instance, a frame will tell the decoder that "this group of pixels is the same as the previous frame", or "this group is the same, except it is a little bit darker", or "this group is the same as the group that was 10 pixels above it in the previous frame".  Sometimes there is enough changes to the frame (or a cut in the video) that it isn't efficient to try to explain the new frame in terms of the old frame.  In that case, the codec just forgets about the previous frame all together, and grabs a whole new frame (probably with something similar to JPEG compression) that has values for all the pixels.  Then the next frame can start referencing this again.  This brand-new frame is called a "Keyframe", because it is essential to making the stream of frames work smoothly.

There are a lot of complicated methods used for doing interframe compression and using keyframes.  We won't go into any of the specifics here, but it is the details of how a codec does this that separates each of the modern codecs from its peers, and these details are very complicated.  Just to give one example, the list of patents that are used in the (fairly old) MPEG-2 standard for interframe compression is several pages long.ódec_de_vídeo

Raw - Not compressed with a codec.  There are still various types of raw formatting, the most common probably being DV Video, which often comes from digital camcorders.
MJPEG - Uses the JPEG format for photographs to compress every frame of video.  This results in a very low compression ratio (large files), but is relatively easy to do.  Many digital cameras that also capture video use this to encode their video.
MPEG 2 (aka H.262) - This is the successor to the Motion Picture Experts Group (MPEG) first video standard, MPEG 1.  When DVDs were first created, this was one of the most advanced codecs, and was selected for use as the only codec universally supported for DVDs.  Since then newer and better codecs have been created but the DVD standard is not able to be changed, so DVDs still use this.
VP3 (aka Theora) - This is an older codec but is significant in that it was the first video codec to be released under an open source license, and was named Theora.  This codec is free to implement for anyone in software or hardware.  However, as it is older the performance of the codec doesn't match some of the modern equivalent.  Theora's performance is substantially better than MPEG 2, but a touch worse than most MPEG 4 Part 2 implementations.
H.263 (aka Flash4) - This is was the most widely used codec for flash video that can be watched in browsers on website.  This is currently being superseded by MPEG 4 AVC/H.264 in flash.  Also, HTML 5 is starting to replace flash video altogether, and it uses H.264 or WebM/VP8
MPEG 4 Part 2 (aka DivX, XVID, MS MPEG4v3) - This was the first major use of the MPEG 4 standard.  One of the original implementations was done by Microsoft and is called Microsoft MPEG 4 Version 3, however it wasn't actually compatible with the baseline MPEG 4 Part 2.  However, that didn't stop a project called DivX from copying this codec and giving it away freely. From there a group created an open source version of the same thing called Xvid.  Thus these popular encoders MS MPEG 4v3, DivX and Xvid are all MPEG 4 Part 2 (but not necessarily compliant with all the standard). Between DivX and Xvid, many of the videos that were available for download on the internet were at one time (and a good number still are) encoded in MPEG 4 Part 2.  The reason for its popularity is that it was a large step up (approximately 40% smaller files) from MPEG 2.
MPEG 4 Part 10 (aka MPEG 4 - AVC, H.264) - This codec is currently widely considered the highest-performance codec available.  It will generate file sizes that are approximately 50% smaller than MPEG 4 Part 2 codecs.  This codec is now widely available as it is distributed with iTunes on Mac and comes built in with Windows 7.  It is also available for playing video on the web with the current version of the Flash plugin.  It is also being supported with HTML 5 video in Internet Explorer (version 9) and Safari.  There are also plugins to get it working in Firefox and Chrome.  In addition to its consumer use, there is a lossless option to H.264, which is useful for storing video during editing or for archival purposes.
VP8 - VP8 was released by Google (which bought the company that created it) into the public domain, for use as a royalty-free codec.  It is available by itself for use inside many different formats/containers, but is also the only video codec supported by the WebM video format.  This format is almost as powerful as H.264, and as its software gets improved is getting closer to matching it.  This codec is available natively for use in HTML 5 video in Firefox, Chrome, and Opera web browsers, and can be run in Internet Explorer with a plugin.  It will also be available in the next major release of the Flash plugin.
VC-1 (aka WMV3) - This is Microsofts latest codec in the Windows Media series.  Its performance falls somewhere between MPEG 4 Part 2 and MPEG 4 Part 10.  It is generally only available on windows platforms.
Dirac (aka VC-2) - Dirac and its most popular implementation, Schroedinger, is a codec developed with a completely different, very high performance compression technology.  Unfortunately this technology is not completely finished and takes a huge amount of computing power to encode and decode video in it.  Because of these limitations, it is only used as a format for storing video during editing and for archival purposes, and in studios with powerful computers.ódec)
FFV 1 - This is a codec that only compresses losslessly, for use during editing and production.  It was created as part of the ffmpeg/libav project.  It is generally faster and has better compression than all the lossless codecs except x264's lossless encoders.

Two Pass Encoding

The amount of compression that can happen to video depends very much what is going on in the video.  If there is a lot of motion (a lot of pixels change between frames) like a fight scene in a movie then it will be tough for the encoder to store all that data in a small area.  Or, if there isn't a lot of motion for instance when the people in the video are talking, and only their lips are moving, it is easy for modern video encoders to just update the few pixels that are changing and to leave everything else alone.  All modern video encoders allow for some amount of Variable Bit-Rate encoding, where, as was discussed in the audio section, certain parts of a clip can use more bits and other parts less, making it easier for the encoder to create something that looks good, while keeping the total file size the same.  The trick with video however is that it can't load large portions of the video into the computers memory (like it can with audio) to figure out where the hard/easy portions are.  To facilitate this, two pass encoding has been developed.

Two pass encoding runs the encoder over the video clip twice.  On the first pass, no usable file is created, but the encoder records (into a separate log file) which spots in the video could use more bits and which could use less.  Armed with this information, the encoder is run for a second pass at the clip.  This time it knows ahead of time where it can skimp on the bits and where it needs to use extra.  Obviously, this won't work on live video, as the entire clip (or at least a large portion of it) isn't available to look through before it needs to start sending out the final, encoded, version.  However, for video files that are created to playback or distribute later, this is a much more efficient way of working. 

Audio in Videos

The audio codecs used are many of the same ones that were covered in the last unit on audio editing, so here we are only going to list some of their most common uses in video.  For more on these codecs, see the audio section.

AC3 - Most common in DVD/BluRay
AAC - Apple Online Videos and BluRay
MP3 - Often mixed with XVID and AVI for amateur video
Vorbis - Often used with Theora or VP8/WebM video
FLAC - The most standard lossless formate used in video archiving
WMA - Used in Microsoft video files