Jump to content

Could someone explain GOP to me?


Shidan Saberi

Recommended Posts

  • Premium Member

I'll avoid the obvious political jokes... first read:

http://en.wikipedia.org/wiki/Group_of_pictures

http://www.iptvdictionary.com/iptv_dictionary_MPEG_GOP_definition.html

 

I know it's confusing and probably someone like Phil Rhodes here can explain it better than I could. My impression is that frames are grouped into short sequences to make compression more efficient.

Link to comment
Share on other sites

If I'm on the right track, it stands for 'Group of Pictures'.

There are two types of codec - GOP and iFrame.

 

I-Frame basically contains separate information for each individual frame of video.

GOP - contains information for the first frame, and each subsequent frame references the difference between itself and that initial frame.

Link to comment
Share on other sites

Okay, i think i may be getting it. So lets say you have 24fps. Each of those 24 frames have their own number of frames with in them which hold a certain detail?

 

Or

 

From 24 frames, a certain number of them are combined to compress the data? Kind of like interlacing?

 

Thanks for the reads. I read them. So I may be getting it.

Edited by Shidan Saberi
Link to comment
Share on other sites

  • Premium Member

Think of it like this; in an iFrame codec, each frame is a still image. in a GoP codec, the 1st frame is a full image, then the subsequent frames (for however many they wish to set it) just records the differences-- these frames are later reconstructed from this data.

So, imagine a white wall. With an iFrame codec, and a static shot, you'd wind up with, let's say, 15 frames of white wall, each one a full photo. In a GoP codec, you'd have 1 full frame image and then (assuming nothing changes @ all... hypothetical, of course) 14 frames of nothing! Later on, in the editing suite a computer would "reconstruct" that missing data by adding the changes (in this case nothing) to the original fame to make up the information.

 

Hope that makes sense.

Link to comment
Share on other sites

Think of it like this; in an iFrame codec, each frame is a still image. in a GoP codec, the 1st frame is a full image, then the subsequent frames (for however many they wish to set it) just records the differences-- these frames are later reconstructed from this data.

So, imagine a white wall. With an iFrame codec, and a static shot, you'd wind up with, let's say, 15 frames of white wall, each one a full photo. In a GoP codec, you'd have 1 full frame image and then (assuming nothing changes @ all... hypothetical, of course) 14 frames of nothing! Later on, in the editing suite a computer would "reconstruct" that missing data by adding the changes (in this case nothing) to the original fame to make up the information.

 

Hope that makes sense.

 

 

ahhh, ok its starting to click. I understood your explaination. So thats why low GOP is good. Because each frame has more of its own independent data instead of a set up frames which just record the changes.

 

Is this just so each frame has less info and really just records the changes from previous frame instead of a whole new frame therefore its being compressed?

 

Does it just make a difference in editing or can you see the difference as well?

Edited by Shidan Saberi
Link to comment
Share on other sites

  • Premium Member

The "group of pictures" idea has its roots in compressed animation formats like those used in Amiga computers in the late 80s and early 90s.

 

Compression codecs like MPEG-1, MPEG-2, h.264, and others, use the similarity between frames to achieve better compression performance. Consider the example of a weather presenter keyed over a computer-generated graphic. The graphic may not change at all for many seconds, whereas the presenter will be moving around. A compression codec could choose to encode the background image once, and store only the areas which move.

 

However, real world images (except some computer generated ones) invariably contain noise, film grain, tiny camera motions or changes in lighting, so in a mathematical sense absolutely all of the image is changing constantly by a very small amount. Because of this, it is necessary to apply a threshold below which small changes in frame data are assumed to be noise, not real motion, and need not be encoded. However, this thresholding invariably means that some very subtle movements are incorrectly assumed to be noise, and ignored. This means that errors build up from frame to frame, and would after a while become unacceptable. Because of this, most compression schemes have a maximum number of frames over which they will assume no changes, after which the entire frame is updated and the process starts again. This number of frames is the "GOP length".

 

So far, we have described a technique called delta compression, which encodes the differences between frames. Modern codecs may use this technique, but much more powerfully they also use motion compensation, a technique which tracks areas of the image as they move around the frame. For instance, a distant background does not change very much as the camera pans; it simply moves sideways, and a motion-compensated codec can detect this and store information about the motion, as opposed to storing entirely new picture data. Again, a degree of approximation must be applied, since even very simple situations such as a horizontal pan will, due to noise, never quite produce a perfectly-registered copy of a previous frame's image data at a different on-screen location. This approximation will again cause the build-up of errors over time, so every so often a complete frame is stored as a basis from which new motion data can be tracked.

 

These complete frames are referred to as intra frames (hence I-frame), or sometimes key frames, because decoding them does not require data from any other frame. Some codecs track motion data both forward from the previous I-frame and backward from the following I-frame, and sometimes one frame will use both techniques, and is referred to as bidirectional (a B-frame). A predicted frame (P-frame) is constructed from the previous I-frame. These are the rules as they apply to MPEG-2, which uses 15-frame GOPs on DVD and 6 when used on HDV cameras, for instance. One of the (very many) differences between MPEG-2 and h.264 is that h.264 may use image data from a variety of different places to create new images, whereas MPEG-2 can only use one previously decoded frame at a time as a reference. Using a shorter GOP makes the encoding job a lot easier as the encoder has a reduced number of frames to search for similar bits of image.

 

These techniques are often very visible in heavily compressed media; the "dancing blocks" of motion-compensated video are often easy to see on YouTube, for instance.

Link to comment
Share on other sites

  • Premium Member

The delivery format also plays a part in the length of the GOP. For example a Digital TV transmission broadcast from an ordinary UHF or VHF ground-based transmitter (so-called "Terrestrial" Digital TV) will benefit from a shorter GOP-count, both to minimize the delay when switching channels, and also to speed up the image recovery if the signal gets momentarily blocked by electrical interference.

 

Compressed video stored on a disk or flash drive on the other hand, is not normally expected to be subject to such interruptions, so a longer GOP can be supported. In general, satellite-linked digital TV is also less subject to signal interruptions, which is why the European Digital TV standards have different versions: "DVB-T" for Terrestrial TV transmissions, and "DVB-S" for satellite distribution.

 

In video editing, if you can tolerate a slight restriction of the available editing points, and you only need simple cuts (ie no adjustment of contrast, colour balance etc) splicing at the start (and end) of GOPs means that the data streams can be stitched together "seamlessly" with no need for de-coding/re-encoding.

 

If you want exact-frame-accurate editing of a compressed data stream on the other hand, this can only be done by decoding the compressed data back to a series of uncompressed frames, doing the edit, then re-encoding the edited frames back to the encoded format, which inevitably leads to a loss of quality.

 

If you download a copy of the Avidemux Freeware Video editor you can have a look at all this for yourself: Avidemux Freeware Video Editor

If you load a compressed file into Avidemux, the right and left keyboard arrows will step forwards or backwards through the video one frame at a time, while the up and arrows will step forwards or backwards between the GOP start points.

 

An off-air MPEG2 transmission will typically have a fixed GOP "jump" of 12 frames. 720p MPEG4 from domestic video camcorders usually has 30 frame jumps, and generally the more compressed the format, the bigger the GOP. SD MPEG4 feature film downloads typically have 250 frames (or more!) per GOP, which means that if there is a signal dropout the picture stream can be lost for as long as 10 seconds (with 25fps PAL anyway). Obviously this would be totally unacceptable for something broadcast over the air, but it's fine for something played from a locally stored computer file where interference is unlikely.

 

You would imagine that the start of each GOP would represent the start of a scene change, but this is not usually the case. If you use Avidemux to first locate a GOP start point, and then step through it one frame at a time, you will see that complete scene changes can occur between the GOP start points. A notable exception is the Windows .WMV format.

 

Windows .WMV files typically have very long GOPs, often the GOP length being also the length of a scene. So in that case, pressing the up and down arrows will often jump between scene changes. This extreme GOP length allows very high levels of compression, although the quality suffers of course.

 

Edited for clarity

Edited by Keith Walters
Link to comment
Share on other sites

Awesome! thanks for the explanation.

 

So GoP is just one method of compression. There are also others like MPEG-1, MPEG-2, h.264 etc.

Is iframe uncompressed and all full images?

What do dslrs like 7d and 5d use?

And what do higher end cameras like reds and arris use?

I'd love to know.

 

Thanks!

Link to comment
Share on other sites

  • Premium Member

Most of this is available from Google.

 

Treating frames in groups is not so much a compression algorithm in itself as it is a characteristic of certain types of codec, particularly those which use delta compression or motion compensation. MPEG-1, MPEG-2 and h.264 (broadly, MPEG-4) have GOPs because they use motion compensation. It is possible to use some of these codecs (often MPEG-4 or MPEG-2) in I-frame-only mode, with an effective GOP length of one. This means the footage is easier to edit as each frame can be decoded without reference to others. It also makes the video easier, mathematically speaking, to compress and decompress, and removes the possibility of errors due to noise building up within a GOP. However, the perceived image quality for a given bitrate will not generally be as good in this case.

 

I-frames are compressed. Basic codecs may use techniques similar to JPEG, based on the discrete cosine transform, which are used by other non-GOP codecs (DV, HDCAM, etc) to compress all their frames individually. More advanced codecs, such as h.264, may use other techniques to compress their I-frames, which may also be available to compress fragments of image data stored for any B or P frame, which cannot be synthesized from previous or upcoming frames.

 

More or less all DSLRs use h.264. Some Nikon and Panasonic cameras can use motion-JPEG, but it is not competitive in terms of quality.

 

Red's "redcode" codec is an implementation of JPEG-2000, which is a more recent development intended to replace JPEG as a format for still images. JPEG-2000 did not catch on as a direct replacement for JPEG, but it does achieve slightly better compression performance through the use of more advanced mathematics, the discrete wavelet transform (as opposed to the discrete cosine transform). Rather than storing three greyscale images representing the RGB or YUV components of an image, redcode stores four greyscale images representing the red, blue, and two sets of green image data from the camera's bayer sensor. Red developed their camera very quickly and this design would have allowed them to use commercial off-the-shelf JPEG-2000 encoder parts. Since each frame is, more or less, a JPEG-2000 image, there is no GOP.

 

Alexa can output uncompressed data directly from the camera's sensor, which Arri call "Arriraw", over a standard SDI cable, although the receiving unit must be specially designed to understand it. Onboard recording is in Apple's ProRes codec, which is a discrete cosine transform-based codec similar in technical approach to motion-JPEG or I-frame-only MPEG. Again, no GOP.

 

P

Link to comment
Share on other sites

  • Premium Member

Awesome! thanks for the explanation.

 

So GoP is just one method of compression. There are also others like MPEG-1, MPEG-2, h.264 etc.

Is iframe uncompressed and all full images?

What do dslrs like 7d and 5d use?

And what do higher end cameras like reds and arris use?

I'd love to know.

 

Thanks!

It's important to understand that all still picture and video compression systems are essentially extremely complex computer programs.

There is a basic "dumb" compression/decompression strategy which sort of "free runs", producing the bulk of the data, and a lot of software "tweaks" which are able to anticipate potential signal errors and take steps to correct them. By no means does mean the software has any "intelligence" as such, these are all simply things that the programmers noticed during the development period and added extra program lines to eliminate them or make them less noticeable. This is precisely why RED keep pointing out that the newer versions of Redcode processing software can extract better quality, even from Redcode shot years ago on prototype machines.

 

An excellent example of software "fiddling" is what happens if you try to encode a video sweep pattern past the encoder's Nyquist limit.

A display with 1920 horizontal pixels can only properly display a maximum of 960 vertical lines, that is 480 back, and 480 white lines.

If you make up a test image of say a 100 lines to 2,000 lines sweep as a bitmap, and make that into part of a Blu Ray slideshow in 1920 x 1080 format, up to 960 lines will be displayed correctly, followed by a short grey bar right at the Nyquist point, and after that, you'll usually get what might best be described as a random bunch of 960-line-resolution "lumps".

Nyquist theory would predict that beyond the Nyquist limit you would get a fairly regular "screen door" effect, which bears little relationship to the actual number of lines on the original image but is entirely predictable. But the MPEG encoder is not "programmed" to follow Harry's laws you see, the programmer obviously thought that random bursts of the maximum resolution the system is displaying would look better than a lot of aliasing, and they're probably right!

 

Theory tends to go out the window when your system is really a computer simulation of an old-fashioned "dumb" system....

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...