In the interests of clarity, there are two sorts of lipsync that are problematic. One is an offset that is always the same. That can be cured with an AVR that allows offset compensation. Any time there is encoding, decoding, A-D, or D-A, there is latency, and it may be slightly different for different paths, and it may be the same, in the same way that a broken clock is always right twice a day.
The other issue is lipsync that varies over time. That is due to three things. First, MPEG encoding and decoding is somewhat elastic, meaning that it can slow down if the decoding gets difficult. It is imperceptible, but cumulative. It is not uncommon to have two identical receivers side by side receiving the same program, and being as much as a second or two off from each other.
The second thing is that AC-3 and MPEG are separate processes for separate elementary streams (audio and video are separate) designed from different protocols, rather than being designed as one protocol. If the video slows, eventually you get lipsync issues, because the audio marches right along without slowing, or caring.
The third thing is that the separate elementary stream that audio is carried in does not mirror or regularly reference the video stream. Both start from the same starting line, but there is nothing holding them in sync with each other.
If you stop/start again or jump back, the audio references the video and reclocks, but only then, and not continually or on a regular basis. And none of this has anything at all to do with how much or how little error correction there is, because FEC happens in real time and does not incur any added delay. There may be much more FEC in MPEG-2, because it needs it more, but there is a ton more delay in MPEG-4, which is one reason why channel changing started to take longer about the time DBS converted to MPEG-4. The GOP structure can be 200 frames (over 6 seconds) long for MPEG-4, but DTV mercifully adds I frames often enough for us not to have to wait 6 seconds for a channel change.
As an extreme example, we used to delay programming for two hours due to timezone issues (still do). The video servers designed to do this would work just fine, but overnight the video would drop to snow on the satellite for 12 hours or so (we're talking back in the days of analog video being converted and delayed by an MPEG-2 server). Video noise, or snow, is a very extremely difficult thing to encode as MPEG, and so it would slow down the playback ever so slightly. By the next morning, once a couple of hours of video had returned, the audio on the delayed playback would be as much as seven MINUTES ahead of the video. We had to reset it every day to prevent that going to air.
It took an industry (television) a long time to adapt and get lipsync right, and when lots of sat channels in HD sprung up in 2004-5, there was a lot of lipsync because no one saw it coming and no one had budgeted for equipment to keep it corralled. By 6 months later, most of it was gone.
Next, is Closed Captioning issues. Stations are going to be required to keep that in perfect sync, get a much higher percentage of words correct, and place it on the screen better in ways that don't block important parts of the screen. Companies like Telestream are rushing to create authoring or correction systems that can handle this. So in a couple of years, CC will be a lot better than it is currently. Live CC will still have a lot of latitude, but CC for recorded programming will have to be near-perfect.
Edited by TomCat, 26 August 2014 - 08:40 PM.