[an error occurred while processing this directive]

Appendix A: Digital Television Production and Transmission

Production and Transmission Standards

The following three charts outline popular current analog and digital television production standards as well as the ATSC Table 3 transmission standards. ATSC Table 3 formats are not required and broadcasters can legally transmit any format they choose.

Following the charts is an essay on the difference between the production standard and the transmission standard centering around 704 pixels versus 720 pixels, and an essay on MPEG-2 splicing, editing and keying.

Note: These formats are known as the 18 "ATSC Table 3" formats (see Glossary). Each frame rate has a 1000/1001 frequency change to accommodate NTSC color (24=23.98, 30=29.97, 60=59.94), for a total of 36 formats. The fractional frame rate will remain part of the DTV landscape until NTSC is no longer needed.

Other organizations have their own version of Table 3; for example, the OCI-N OpenCable interim specification (www.opencable.com/pubdocs/OCI-N-INT01-991022.pdf) has 640x480 at 60p and 59.94p as an HDTV format and the draft of EIA-818, known as the DTV Cable Compatibility Specification, has 480 vertical lines matched with 720, 704, 640, 544, 528 and 352 active horizontal pixels.

Sweet 16 And Never Been Missed

By Mark Schubin

First, nothing is wrong with the number 704. It's divisible by two six times. That means it's divisible by 16, 16 x 2, and 16 x 4--nice features for MPEG compression. It's 11/10 of VGA's 640 horizontal pixels. And it's the star of ATSC A/53 Annex A Table 3; 16 of the 36 formats in that table have 704 active pixels per scanning line. There's just one problem with 704: It's not 720.

What's 720? Besides being 16 more than 704, it's the number of active pixels per scanning line in Recommendation 601, the first global video standard and the first for component digital video. That was followed by ANSI/SMPTE 125M, the D1 DVTR format, Quantel's Henry, Panasonic's D5, Sony's Digital Betacam, Ampex's DCT, Tektronix's Profile--there's an awfully long list of teleproduction products with 720 active pixels.

Nevertheless, when Table 3 was being hashed out in 1995, it ended up with 704, not 720. Why? The key factor was probably the idea that some cable-TV and satellite programmers seemed to prefer half-resolution images to save bandwidth. Half of 720 is not divisible by MPEG's 16-pixel-wide macroblocks; half of 704 (352) is.

The number 352 doesn't exist, however, in Table 3. In places where it does exist, like the European ETSI standard ETR 154, the full resolution listed is 720, not 704.

In fact, just about every standard for component digital video worldwide features 720 and not 704. That includes European, Japanese, American and global standards. ATSC's own A/63 standard, intended for countries currently transmitting 625-line television, calls for 720, not 704.

It could be argued that it doesn't make any difference. The eight pixels shaved off each edge of the picture will never be seen. Any changes in aspect ratio that result will be insignificant. If DTV encoders accept standard inputs, who cares what they do to the edges?

The same year the ATSC standardized 704 pixels (1995), SMPTE approved a recommended practice on blanking, RP 187. It specifies a "clean" aperture of 708 x 480--not quite ATSC's 704 x 480 but closer than Rec. 601's 720 x 483.5. Why? Video processing "operations (including most filters) produce edge-related artifacts."

Consider an image where the first pixel is white. Coming out of blanking, the video signal will have to hit full level in just 74 ns. That could produce overshoots and ringing after filtering. SMPTE, therefore, defined a "clean aperture" that an "ideal display" would show and "a region where artifacts are acceptable" outside the clean aperture.

In other words, the clean aperture doesn't stay clean unless the unclean area surrounds it. To put it in SMPTE's words, "It should be noted that it is not the intent of this practice that information outside the clean aperture be discarded."

If only 704 pixels are transmitted, and any processing occurs after transmission (such as upconversion, fading to black, re-encoding for transmission on cable TV, consumer recording, etc.), then the artifacts enter the visible portion of the signal. An equivalent SMPTE clean aperture for 704 pixels would be just 692.

More recently, the issue of MPEG transcoding has been raised. If a video signal is compressed by a factor of, say, 10:1, then the encoder must throw away 90 percent of the information. It will make certain decisions about which 90 percent to discard. Succeeding stages of compression may or may not end up discarding the same. If they throw away previously undiscarded information, the effect is that of using a higher compression ratio, usually leading to worse pictures, the dreaded "concatenated compression."

Transcoding carries information about compression decisions from one stage to the next so that each encoder can discard the same information. As a result, the losses of concatenation are either reduced or eliminated.

Achieving transcoding requires matching MPEG macroblocks, with their associated motion-vectors. Unfortunately, the macroblocks of 720 don't align with those of 704. Transcoding, therefore, cannot be used where it may be needed most: in the heavy compression of DTV.

There are two solutions to these problems. One is to change 720 to 704--in every one of the world's standards, in all of the production equipment that exists, in the vast libraries of programming, in the SCTE digital cable-TV standard, and, presumably, even in the ATSC's A/63 standard.

The other is to allow 720-pixel lines in A/53. The FCC already does. Rejecting Table 3, they did not demand 704 pixels, but they kept Section 5, which says, "The allowable parameters shall be bounded by the upper limits specified for the Main Profile [of MPEG-2] at High Level."

Based on publicity about the chipsets being used, that's what DTV receiver manufacturers are doing, too--they're not decoding 36 different formats; they're simply decoding MPEG-2 MP@HL. It would seem, then, that 720-pixel video is not a problem for DTV.

ATSC-compliant encoders, however, must lop the edges off of 720- pixel imagery because 720 keeps being rejected by the ATSC Executive Committee, comprised in part by manufacturers who sell 720-pixel production equipment. Go figure.

MPEG-2 Art: Splicing, Editing and Keying

By Michael Silbergleid

Television, as we know it, has changed. Then again, television has always changed. From black and white to color, mono to stereo to multichannel, composite to component, analog to digital, baseband to compressed.

Perhaps the biggest changes are still to come--NTSC to ATSC and PAL to DVB. These changes mean a new way of working with video and audio in MPEG-2 compression, which is the specification for both transmission standards. This has led to a desire by broadcasters and broadcast equipment manufacturers on both sides of the Atlantic (as well as both sides of the Pacific) to remain either in baseband as long as possible before compression, or once compressed to stay compressed.

The reasons for this are simple: When you compress a signal, you are throwing away part of that signal--something you might need later, and compression encoding and decoding cycles can have an unwanted effect on the quality of the image and sound.

So ever since manufacturers, broadcasters and regulatory bodies decided that MPEG-2 would be a good idea for one part of the broadcast chain (transmission), the industry has been trying to adopt MPEG-2 to all of the other parts of the chain--production, post and storage so that once in MPEG-2, you could stay that way throughout the chain.

Unfortunately, we tend to want to do things to our video (like dissolves, wipes and keys) which make life in the MPEG-2 world rather difficult. The consensus has been that you couldn't switch, splice, edit, key or do effects in the MPEG-2 realm accurately. That isn't, and has never been altogether true.

What MPEG-2 Is And Isn't

MPEG-2 is just a set of compression tools. It is not a tape format or a transmission format (although there are MPEG-2 tape recorders). For production and post, we use the MPEG-2 4:2:2 toolkit. For transmission, we use the MPEG-2 4:2:0 toolkit. While this is an ultra-simplistic way of looking at MPEG-2, it is accurate.

Keep in mind that there are three types of MPEG-2 frames and that you can use different combinations of these frames during compression.

I frame: Intraframe. This frame contains all the information needed to construct a complete frame of video. It is exactly like a M-JPEG or DV frame in that it contains all the information needed to construct a single video frame. Video that is encoded using only I frames can be switched and edited (cuts only) without having to decompress the signal. Provides the least encoding efficiencies since the complete frame is encoded.

P frame: Predictive. The frame is encoded using motion-compensation prediction from a past reference picture. Contains the difference between the actual image and the predicted image. Provides better encoding efficiencies than I frames.

B frame: Bi-directional predictive. The frame is encoded using motion-compensation prediction from both a past and future reference picture. Contains the difference between the actual image and the predicted image--it is not an average of the previous and future frames. Provides the best encoding efficiencies, however, a third memory buffer is required in addition to the buffers for past and future reference (I and P) frames. Fast search modes ignore B frames.

MPEG-2 streams can be encoded using I frames only, I and P frames, I and B frames, or I, P and B frames. I frames are always needed and are the first frame in a Group of Pictures (GoP), which then ends just before the next I frame. GoPs can have a length of one (all I frames) or more. The larger the GoP, the more efficient the compression encoding, but the harder it is to manipulate and survive errors in transmission (since an I frame is needed as a reference).

So, MPEG-2 causes problems because:

  • If you try to manipulate anything other than an I frame, it's like trying to edit or switch on a frame that doesn't exist...because it doesn't.

  • You can't manipulate something, like an MPEG-2 frame, that really isn't video.

    Unfortunately, television has progressed from its early days when camera lens turret moves would be seen on the air and the viewing public accepted them. (If you've never seen a lens turret move, imagine seeing a wide shot, your picture going blurry then black for a second and then seeing a close-up). Today, we want everything to be seamless and perfect. Unfortunately, MPEG-2 makes that rather difficult. Here's why:

    Splicing

    You want to switch between two MPEG-2 streams. To do this perfectly, the streams need to be synchronized. To do it when you push the "take" button, they also must have an I-frame pass by the switch at the right moment. With I frame only this is simple, but imagine having a GoP where I frames come every half a second or more.

    There are some ways around this. The NDS System 3000 MPEG Splicer/Transcoder can switch between three MPEG-2 streams. The catch is that you can only splice in non-seamless and near-seamless modes. During a non-seamless splice, the viewer would see either a blank or frozen frame generated by the viewer's decoder for anywhere from three to 26 frames depending on the splice point and when the next I frame appears.

    For near-seamless splicing, there must be control over one of the sources according to Mike Knowles, manager of business development-digital terrestrial for NDS. Therefore that source must be local and on disk. With near-seamless splicing, the delay happens at the station. When the "take" button is pressed, the splice doesn't occur until after the last frame in the GoP, so the next I frame comes from local material. The latency for near-seamless splicing is dependent on the size of the GoP from the outside source and where in the GoP the "take" button is pressed.

    The Philips DVS 4800 StreamCutter currently provides seamless splicing, but in a slightly different way. The StreamCutter supports local insertion (remember the need to have control of at least one local source) and switching between a single high definition source and multiple standard definition programs for ATSC. For DVB, the StreamCutter handles up to five program streams.

    While StreamCutter does not rely on the existence of splicing points (a marker for clean splices within the MPEG-2 bitstream), it can utilize them when they are present (and available) for its Seamless Splicing Point Mode, where the main and local streams must contain compatible and aligned splicing points.

    The Seamless Uniform Color Pictures (UCP) mode is somewhat visually similar to NDS's non-seamless splice except that a uniform color (or black) is sent by the StreamCutter and seen by the viewer instead of a decoder-generated frozen picture or black. The UCP period is fixed and is less than half a second. But there is a slight delay as the system waits for the end of the GoP before sending the color.

    One of the best features of the StreamCutter is that it can simultaneously switch four bitstreams each containing a single video component, or a single bitstream containing up to four video components. With StreamCutter, if you're already multiplexed for four channels, you can replace any one of those channels without having to demux.

    StreamCutter supports DVB-compliant streams up to at least 80 Mbps as well as 19.39 Mbps ATSC streams. All interfaces are DVB-ASI.

    Hewlett-Packard used a buffering technique within their MPEG-2 MediaStream family of products for MPEG switching. HP's "CleanCut" technology provides seamless cuts-only capability between bitstreams coming off the server as video. This may seem fairly simple, but is actually very complex due to timing issues. What happens is that the server 'pre-charges' the 'cut-to' decoder with the complete GoP so that a splice can be made on any frame within the video domain. Sort of like an MPEG-2 frame store and synchronizer.

    C-Cube Microsystems has a lot to gain by folks wanting to decode then re-encode--since they make the chips that do the encoding and decoding. Their DVxpress-MX chip may be the world's first all-digital, mixed format compression chip for MPEG-2 (4:2:2 and 4:2:0) and DV25 (as well as another model that adds DV50) (4:2:2, 4:2:0 and 4:1:1), but in order to do the conversion, the single chip converts the compressed video to digital baseband--with a transcoding latency of seven to eight frames. Dr. Feng-Ming Wang, C-Cube's general manager for the PC/Codec division believes that "the MPEG domain is always limited" and that baseband provides the "least possible problems." Dr. Wang says that the biggest problem is maintaining a compliant MPEG bitstream after the signal is modified without being decoded. "Baseband," he says, "is easy."

    Baseband may be easy, but most experts agree that it is not the most efficient way of dealing with the need to manipulate video encoded as MPEG-2.

    Other solutions, like those under development from Lucent Technologies and Sony involve not just locating the I frame for splicing, but in forcing the creation of an I frame through transcoding so that a splice can take place at any time.

    Editing

    How do you edit on a frame that doesn't exist? For the FAST silver. (formerly FAST601) MPEG-2 nonlinear editor the answer is easy...all frames exist. silver. encodes everything with I frames so each frame can stand on its own. If you want to do a cut, you can do it in the MPEG-2 realm and not loose quality by decompressing and recompressing. Of course if you want to do anything else (dissolve, key, wipe, etc.) silver. will still decode to baseband, manipulate the video, and re-encode the result to MPEG-2.

    Sony's Betacam SX format works very differently. BetaSX uses I and B frames for efficient compression during recording. But for editing, Sony decodes back to baseband video. Since the frames that the B frame is using for bi-directional prediction are already recorded on the BetaSX tape, a memory buffer lets the B frame see ahead before conversion to baseband.

    Sony's hybrid DNW-A100 Beta SX DVTR does MPEG cuts only editing without going to baseband since the file is being read off the disk and being manipulated by a playout list.

    Keying (and Effects)

    On-screen logos are everywhere. Stations and networks branding themselves in the eyes of the viewer. But what if the signal coming into your plant is already compressed for transmission, like the recent Harris ATSC pre-compressed feed of the Space Shuttle Discovery launch with John Glenn? You might think that you would have to decode all of the picture. While that may have been the popular theory, technology is advancing to the point where you can do your keying without having to decode entirely to baseband.

    Today, if you want to do a key on compressed video, you no longer have to decode the entire frame. Pro-Bel offers a logo inserter that enables logos and IDs that are stored as bitmapped files to be directly inserted into an MPEG-2 bitsream. By using a new transport stream manipulation technology developed by the Institut fr Rundfunktechnik (IRT), the research and development body of the public broadcasting authorities of Germany, Austria and Switzerland, the bitstream is only decoded in the area required for graphic insertion on a macroblock level, so that the remainder of the picture passes virtually unaltered. Both transparent and opaque graphics can be faded in and out using pre-set rates at user-defined screen locations.

    While not just for keying, Snell & Wilcox is using MOLE technology, as developed by the ATLANTIC project (BBC [U.K.], Centro Studi e Laboratori telecomunicazione [Italy], Ecole Nationale Superieure des Telecommunications [France], Ecole Polytechnique Fédérale de Lausanne [Switzerland], Electrocraft [U.K.], Fraunhofer-Institut für Integrierte Schaltungen [Germany], Instituto de Engenharia de Sistemas e Computadores [Portugal], Snell & Wilcox [U.K.]) to keep concatenation errors to a minimum. MOLE decoders forward the previous encoding decisions at the macroblock level invisibly through the serial digital signal path, so that subsequent encoders can 'clone' the original encoding process, except where the picture has changed (wipes, dissolves and keys).

    Thomson has what they call the "Helper Channel." Similar to MOLE technology, the helper channel has decoder-generated metadata placed in the vertical interval of a digital signal derived from a Helper Channel-equipped MPEG-2 decoder and used by a Helper Channel-equipped encoder to re-encode the signal after manipulation. While slightly less efficient (and less expensive) then MOLE technology, the Helper Channel is another way to get around the problem of having to decode the entire frame--and it's available now, while MOLE technology is awaiting SMPTE standardization.

    Philips, however, is the technological leader in MPEG-2 keying and picture manipulation with their MPEG Object (Logo) Insertion system. At a demonstration at IBC 1998, hidden in a section of their booth, Philips' Vincent Boutroux of the Digital Signal Processing group in France showed what he could do with a business card.

    First he would scan in the logo off a business card, then he would place that logo in the picture. The difference between other systems and the Philips system is that Vincent did not decode the background video to baseband. The system works by using transcoding technology and DCT coefficients.

    Images can be any size and shape but are currently displayed transparent or up to about 80 percent opacity and their video levels follow the background video.

    While there is some image flashing as the system learns the MPEG-2 sequence of the background video, the system does indeed work.

    In fact, logos can be made to fly in and out if so desired, or a ticker can be placed on the screen. All without conversion to baseband.

    One Other Small Problem

    Due to the latency between the encoder at the broadcast facility and the decoder at the viewing point, techniques that we take for granted in analog will have to find other solutions. Two of the most prominent are time and cueing.

    If you give out a time tone at the top of each hour, you can compensate for your station's encoder latency, but there is no way to compensate for the viewer's decoder latency (and each brand will have a different latency). To the viewer, your top-of-the hour chime will be close, but not exact.

    For news organizations that use off-air cueing for live remote reports, the latency at the reporter's TV set will be too great for the signal to be useful. For this task, private radio frequencies (which will have to be coordinated since everyone will want one) with IFBs are one solution while reporters learn not to watch themselves on-air.

    [an error occurred while processing this directive]


    Back to the Table of Contents
    Back to Digital Television's Home Page