Hi there,
Thanks for checking out a 'Explained' blog! In these; my aim is to give you some details on what some common technical terms mean in digital video and film. I hope you find something useful here.
Please note; I do not consider myself an expert on the topics - if you believe I've made a mistake somewhere please do let us know! I'd love for this to be a learning experience for me as much as anyone else.
In this post we’re going to look at something called intra-frame & inter-frame compression, and the GOP structure that entails the latter. Let’s start with the simplest, intra-frame.
Intra-Frame Compression
As we know digital video is made up of individual frames, 25 per second for PAL, 30 for NTSC and 24 for film (excluding HFR, High Frame Rate). We also have interlaced footage, which comprises of two ‘fields’ per frame, giving 50i for PAL and 60i for NTSC, but we’re not covering that for now.
With intra-frame compression, each and every frame is a full, standalone image, completely independent of any other frame. In other words, this is probably what your average individual would expect a frame to be. However this is rarely the case, intra-frame is only really found in production, for most consumer video content we want to find as many ways of compressing data as possible without noticeably reducing quality, and one way to do this is by using inter-frame compression.
With intra-frame compression, each and every frame is a full, standalone image, completely independent of any other frame. In other words, this is probably what your average individual would expect a frame to be. However this is rarely the case, intra-frame is only really found in production, for most consumer video content we want to find as many ways of compressing data as possible without noticeably reducing quality, and one way to do this is by using inter-frame compression.
Inter-Frame Compression
Where intra-frame is made of full pure frames, inter-frame has multiple frame types in order to reduce file size and bandwidth requirements, this is using a technique called ‘motion compensation’.
Typically there are three types of frames used in inter-frame compression, these are:
Typically there are three types of frames used in inter-frame compression, these are:
- Intra frames
- P Frames
- B Frames
Intra frames
These are the same full frames used exclusively in intra-frame compression, hence the name.
P Frames
P frame stand for ‘predicted frame’. These frames look at previous frames to determine which pixels of the frame need refreshing and which can be recycled from the previous frame. P frames are typically 50% smaller in file size than Intra frames.
B Frames
B frame stands for ‘bidirectional-predicted frame’. These look at both previous and subsequent frames to calculate which pixels need to be refreshed. These are typically 80% smaller in file size compared to intra frames.
![Picture](/uploads/2/4/5/7/24574909/4211429.jpg?388)
So how does motion compensation work? This technique involves having sequences of i frames, p frames and b frames (known as a GOP Structure, see below) that allows for aspects of a frame to recycled allowing the file size to be reduced.
The technique is particularly effective on static shots that have non moving areas in the frame. A good example of this is an interview. On the right is a still from an interview I shot a while back, it’s a static angle with the talent positioned on the right of frame, taking up no more than 40% of the screen. This means the entire left half of the screen is ultimately unchanging (minus some potential minor changes in light and the minimal grain, dependant on ISO level & camera). Considering this, along with the fact that our eyes will be drawn to the talent anyway, it makes no sense for our codec to refresh all those pixels every frame.
Typically there will be somewhere between 1 and 4 full intra-frames per second, with every other frame having some level of motion compensation taking place. Of course after each ‘cycle’ (the space between each intra-frame) the process resets as a new full frame is produced. You may notice this whilst watching highly compressed footage (perhaps a low quality YouTube video for example). The non moving parts of the image (particularly common on black/dark areas) may appear to become ‘blocky’ and pixelated due to lots of predictions having taken place without a full frame reference. The blockiness would then disappear as the next intra-frame appears in the sequence.
The technique is particularly effective on static shots that have non moving areas in the frame. A good example of this is an interview. On the right is a still from an interview I shot a while back, it’s a static angle with the talent positioned on the right of frame, taking up no more than 40% of the screen. This means the entire left half of the screen is ultimately unchanging (minus some potential minor changes in light and the minimal grain, dependant on ISO level & camera). Considering this, along with the fact that our eyes will be drawn to the talent anyway, it makes no sense for our codec to refresh all those pixels every frame.
Typically there will be somewhere between 1 and 4 full intra-frames per second, with every other frame having some level of motion compensation taking place. Of course after each ‘cycle’ (the space between each intra-frame) the process resets as a new full frame is produced. You may notice this whilst watching highly compressed footage (perhaps a low quality YouTube video for example). The non moving parts of the image (particularly common on black/dark areas) may appear to become ‘blocky’ and pixelated due to lots of predictions having taken place without a full frame reference. The blockiness would then disappear as the next intra-frame appears in the sequence.
GOP Structure
The GOP Structure defines the pattern of i, p and b frames being used in a certain encode. Different patterns will of course result in different levels of quality and compression.
Above is a basic diagram of a potential GOP structure. This would be a particularly high quality example of a inter-frame based encode. A more typical encode would be something like IBBPBBPBBPBBI, leaving one intra-frame every 12 frames. In writing, we would express this as: M=3, N=12. Where M represents the space between two ‘anchor frames’ (either I or P) and N is the distance between I frames. This value for N is also known as the GOP Size.