Frame accurate video in HTML5
Hello, I am Dirk-Willem van Gulik, Chief Technical Architect here at the ³ÉÈË¿ìÊÖ. An important part of my job is to help the ³ÉÈË¿ìÊÖ use the right internet and web technologies - and help the industry and open standards bodies create the internet and web technologies which are right for the ³ÉÈË¿ìÊÖ.
Now the ³ÉÈË¿ìÊÖ is a very special place to work. And one of the main things which makes it so special is "Quality". At the ³ÉÈË¿ìÊÖ it is a currency, it is a goal, it is a culture - and as an engineer, it is something you are tasked to deliver.
One of our roles in FM&T is to provide our creative colleagues with tools. The tools they need for broadcast and to create high quality video. This includes tools for "non linear editing" - taking short clips, cutting them to the right length, stringing them together, adding some voice overs and graphics - and then endlessly tuning the resulting video so that it tells a story perfectly.
Usually we shoot hundreds of hours of video, import it onto an editing server, painstakingly tagging or "logging" the content on the way, and then edit each clip into something that makes sense. Because the original video files are so huge (especially in HD), we actually edit low resolution "proxy" versions of each file, and we store edit decisions using timecodes rather than actually mashing up the real video all the time. Then everything can be synced up and "conformed" using the original high-quality versions later on.
Throughout all of this, s play a major role. They are the key 'link' to get right. They ensure that recipes done on the proxy give identical (albeit at a higher resolution) results when repeated on the raw high resolution footage at the end. They ensure that the audio tracks are perfectly synchronized with the clips, that transitions start and end at exactly the right time (and there is not some extra black frame due to a rounding error). They are also important in the creative process - as they let us communicate. We can ask each other to look at a specific frame - or discuss whether we move a cut by a few frames to achieve a particular effect.
If this sounds a bit overly perfectionistic and artistic - then consider this - a cut every 3 seconds or so is quite normal. So if you are off by 1 frame either way - then we're already talking errors of over 2%! Even a very pragmatic engineer would have to agree that that matters!
So timecodes using exact frame references are important. Really important. And the dirty little secret is that the internet has none. NONE! None of today's open standard technologies, or even the dominant proprietary ones, do timecodes right. They are off by one; they round to the nearest half second, they jump to the nearest previous I-frame. Whatever. (In all fairness - there are highly specialist products one can buy and install, usually with special browser plugins, which are accurate, often provided they are used with specially prepared material and within a single LAN. But none of those are conductive to the 'internet' network effect by facilitating collaboration between creative people across organisational barriers.)
Because the first thing a professional needs is a rock solid way to reference each and every individual frame accurately. So they can talk about it. For us - 'video on the web' is a bit as in - today the internet feels like that plastic 1:1 model of a spitfire[9]. It looks like one - but it sure does not fly.
Now over the past two months that landscape has started to radically change. A few of us[1] have been working with the various open standard and open source HTML5 communities. And as of this week, after 120 emails, the bleeding edge development versions of several HTML5 implementations (as used in Safari, Chrome, Mozilla and many others) are now fully frame accurate.
First was (the basis for Safari, Chrome and , which as of revision r77919 has frame accurate playback!
Really. Frame Accurate. Actually even more accurate than just a frame (which is important for audio). You can jump to any point in the video (i.e. 1 hour, 3 minutes, 6 seconds and 5 frames, or to frame 178127) - and it will be exactly at that frame. Not at the nearest i-frame, rounded down to the nearest second, or off by one. No it will be exactly at that very frame.
So today, the HTML5 community has opened a door for us. Which will allow creative people to collaborate and edit professional video on the web.
Do know though that, while key, this is just a first step. There is a lot to still build, so we'll need many hyper creative companies and internet engineers working together to make this work. We need to create a new breed of web based production tools which can interact at the quality levels professionals and the ³ÉÈË¿ìÊÖ expect. And we still have issues around UMIDs (unique global references for video) to crack. And even some very basic things (like did you know that a pixel in the video world is actually rectangular, rather than square?!) will need to universally understood between the broadcasting and internet engineers. But boy, getting s, that is a big step!
Again - a big thank you to the open source folks of WebKit and Mozilla. IE9 is not quite there - (progress is tracked at ) but Microsoft has let us know that we "can expect the video-frame-accurate seeking be available when IE9 is final"!
[1] To give credit where credit is due: within the ³ÉÈË¿ìÊÖ, Raymond Le Gué (programme director at ³ÉÈË¿ìÊÖ) insisted on having frame accurate playback in the browser. Rob Coenen went on and beyond his call of duty to make this happen, - patiently working with the wider developer community, explaining , why film and television production cannot live without it, proving that it was not working in browsers and helping the developers to fix it. He got help from Bas Schouten (at Mozilla), Andy Armstrong and Dirk-Willem van Gulik (both at the ³ÉÈË¿ìÊÖ).
But most credit should go to the open standards and open source communities around Webkit, Chrome and Mozilla which made it happen: Andrew Scherkus and the Chromium team get credit for being . The actual fixes where ultimately created by Jer Noble, Eric Carlson (both at Apple) and Chrome developer Andrew Scherkus; while Matthew Gregan and Anthony Hughes did the job for Mozilla.
Dirk-Willem van Gulik is Chief Architect, ³ÉÈË¿ìÊÖ Future Media & Tecnology
SMPTE timecode based and frame accurate metadata logging is now possible over the web with HTML5. The image above is a made up screen shot of what a prototype tool to do this might look like.
Comment number 1.
At 21st Feb 2011, Greg Tyler wrote:Excellent work! It's great to see such positive responses from the browser developers and awesome to see HTML5 going towards true usability.
Complain about this comment (Comment number 1)
Comment number 2.
At 21st Feb 2011, mrg17 wrote:looks to be a 404 to me
Complain about this comment (Comment number 2)
Comment number 3.
At 21st Feb 2011, Andy Wilson wrote:is the correct URL.
Complain about this comment (Comment number 3)
Comment number 4.
At 21st Feb 2011, kierank1 wrote:Do drop-frame timecodes work correctly? If not, then I'm glad it's you not me whose trying to get those to work...
Though to be honest W3C should have consulted the right people in the first place and built a proper video system instead of people having to retrofit advanced features onto a mere shell of a standard which browser vendors have implemented differently.
Complain about this comment (Comment number 4)
Comment number 5.
At 21st Feb 2011, Duncan wrote:Looking fantastic, great that you've been pushing progress on this front! I'm quite interested in the web based video logger in the screen shot, this is something I've have been looking into as well. Mainly interested in what your feeding it from in terms of video and meta data and indeed what the meta data is being fed back into, is this linked to an ingex or sQ server system (I know you've just bought a few more of these from Quantel) or something else?
Complain about this comment (Comment number 5)
Comment number 6.
At 21st Feb 2011, _Ewan_ wrote:That's lovely. Can we have HTML5 video for we mere viewers now, or are we to remain saddled with closed, proprietary, DRM crapware for ever?
Or in other words - nice technical job; what are you actually going to use it for?
Complain about this comment (Comment number 6)
Comment number 7.
At 21st Feb 2011, Nick Reynolds wrote:mrg17/Andy Wilson - thanks for the heads up on the broken link. It's now working.
Complain about this comment (Comment number 7)
Comment number 8.
At 21st Feb 2011, Dirk Willem van Gulik wrote:@kierank1 "Do drop frames work" - the short answer is - we've not properly tested that (a rudimentary test with Apple their 'Blip Blop' sample suggest it is correct).
The reason for this is that drop frames* are very specific to NTSC, which is dominantly used in the Americas and some parts of Asia. In the ³ÉÈË¿ìÊÖ we use PAL (and hence we tested with 25 and 50 frames/second). We'll leave the detailed testing of that to our American brethren (happy to include a high quality test video though on above test side).
Thanks, Dw.
*: for those curious - NTSC has 29.976 frames/second. Which is not a nice round number. To make it almost round again - we drop (i.e. skip) a 'count' twice a minute (except every 10 minutes - and then some more refinements) - pretty much the same concept as is behind the leap day on the 29th of February which we have every 4 years (and then some special refinement every 100 and 400 years).
Complain about this comment (Comment number 8)
Comment number 9.
At 21st Feb 2011, Dirk Willem van Gulik wrote:@ewan: "what are you actually going to use it for".. Well - right now, today, for "Nothing".
As these 'improvements' are not yet on the market - they've only just gone into the source code repositories of the browsers. From here it needs to go into Alphas, Betas, release candidates and then gradually become commonly available. That is not a process of weeks or months.
However it does inform us in the timescale of years.
It means that innovative companies can suddenly build editing tools, review tools, logging tools or perhaps a whole new class of creation. Tools which are truly 'on the internet'. And which, at least in theory, have the frame accuracy to allow professional use.
But there is still a long way to go; audio is complex, colour is difficult to get right - performance and efficient use of bandwidth fiendishly complex. However - frame accuracy was one of the major hurdles to clear.
Complain about this comment (Comment number 9)
Comment number 10.
At 21st Feb 2011, johndrinkwater wrote:Glad you’re trying to improve the state of web browsers, just make sure you don’t favour one more than others. You didn’t mention Opera: should I assume because they dont have a public tracker that you didnt link it, or they didnt have the problem?
Fully expecting WebM iPlayer playback soon…
Complain about this comment (Comment number 10)
Comment number 11.
At 21st Feb 2011, Stefan Goodchild wrote:I've been playing with this type of thing for a side project for a little bit and have a very rough working prototype that works in Safari 5.0.3. Not 100% but has most of the tools you'd expect including keyboard navigation through the video.
Complain about this comment (Comment number 11)
Comment number 12.
At 21st Feb 2011, Rob Coenen wrote:Excellent, the first spin off already! I'm curious to see what other web-based and frame-accurate broadcast tools will emerge.
Complain about this comment (Comment number 12)
Comment number 13.
At 21st Feb 2011, Stefan Goodchild wrote:I'm planning on adding in proper timecode support now I know it's coming. The project is a collaboration tool for people who are remote working in the visual industries. SFX houses, soundtrack composers, freelance editors that kind of thing.
Complain about this comment (Comment number 13)
Comment number 14.
At 22nd Feb 2011, voce wrote:This comment was removed because the moderators found it broke the house rules. Explain.
Complain about this comment (Comment number 14)
Comment number 15.
At 22nd Feb 2011, Dirk Willem van Gulik wrote:@johndrinkwater - on " You didn’t mention Opera: should I assume because they dont have a public tracker that you didnt link it, or they didnt have the problem?"
As you may have seen (assuming you are indeed an opera user) - their current browser seems to jump to a nearby second mark (with their latest Presto/2.7 engine).
And you are correct - Opera does indeed not operate a public tracker - so you'll have to await their next release and its release notes.
The good news is that Opera their core developers are well aware of the issue and are actively working with the community (See for example the What-WG mailing list (whatwg.org) around 2011-2-21).
Complain about this comment (Comment number 15)
Comment number 16.
At 22nd Feb 2011, kierank1 wrote:@Dirk
Thanks for the response. I agree in Europe drop frames are not an issue but if you want to get people to use the HTML5 timecode support, it has to be feature complete. In my opinion leaving parts of the standard implementation-dependent or incomplete was one of the real problems with HTML5.
Complain about this comment (Comment number 16)
Comment number 17.
At 22nd Feb 2011, HD wrote:"And even some very basic things (like did you know that a pixel in the video world is actually rectangular, rather than square?!)".
Is that because you are working only with 720x576 and 1440x1080 video? What about 1280x720p50 and 1920x1080p50 video (surely they use square pixels - and SD video could use square pixels if it was sampled that way)? Will you be allowing use of those formats too (including 1080p60 etc)?
What about the refresh rate of the LCD screens in use? Won't most people be using 60Hz (or is it 59.94Hz?) LCD monitors? Won't that mean pull-down judder problems with most 25Hz & 50Hz content?
Have you done tests of European "100Hz" HDTVs (eg. LCD/Plasma) and do they actually operate at exactly 100Hz - even though their input for PC use is 60Hz? How does that affect ³ÉÈË¿ìÊÖ 25/50Hz programmes when the PC input to a "100Hz" TV is 60Hz (which is the rate they recommend).
What if you are making a TV programme that incorporates 24Hz (or 23.976Hz) and 60Hz (59.94Hz) content - eg. a film review programme or the BAFTA film awards? Wouldn't it be better if the content was in it's original form? eg. for something like the BAFTA film awards couldn't the film clips be shown at whatever they were shot at (eg. 24/23.976 or more for things like Avatar 2), but the rest of the program be at 50Hz? ie. allow variable frame rates?
Seeing as you are helping set standards - couldn't you encourage video/film content to be made and encoded at integer rates (eg. in the film/US world).
So couldn't you allow higher frame rates than 50Hz and allow frame rates of the source footage to be kept at it's native rate without speeding up/slowing down or similar conversions that might involve judder/interpolation?
Complain about this comment (Comment number 17)
Comment number 18.
At 22nd Feb 2011, JoeAD wrote:Appreciate the work being done here and the informative article, however there was no mention of the retched DRM policy of the ³ÉÈË¿ìÊÖ and how the use of open html5 video technologies and DRM will or can co-exist?
Complain about this comment (Comment number 18)
Comment number 19.
At 22nd Feb 2011, Dirk Willem van Gulik wrote:@HD1080 - thanks - those are really good comments and questions. Rather than answer them here - we've been preparing another more elaborate post on exactly these topics for later in the year.
Do not hold your breath though - it is a complex story to craft - and we want it to be exactly right as to solicit valuable community feedback.
Complain about this comment (Comment number 19)
Comment number 20.
At 22nd Feb 2011, Nick Reynolds wrote:JoeAD - I think you're veering off topic. Please see this blog post which explains the ³ÉÈË¿ìÊÖ's current position on content protection.
Thanks
Complain about this comment (Comment number 20)
Comment number 21.
At 23rd Feb 2011, Kit Green wrote:Are you re-inventing the wheel? Don't some current logging applications such as CatDV already allow frame accurate logging and rough cuts to be made over the web?
Complain about this comment (Comment number 21)
Comment number 22.
At 25th Feb 2011, johndrinkwater wrote:Dirk Willem van Gulik, I am a Web user :) I prefer to use Firefox, though it is not my sole browser.
Sadly I worry these changes wont be useful for ³ÉÈË¿ìÊÖ content for me for ages, all of my browsers are sent formats I can’t view.
Complain about this comment (Comment number 22)
Comment number 23.
At 27th Feb 2011, sbstreater wrote:Hi Dirk!
For someone with such an important position at such an august body as the ³ÉÈË¿ìÊÖ, you are surprisingly and seriously out of date with your implication that there are no time-coded frame accurate editing systems available on the web without "special" browser plugins, and used with specially prepared material and within a single LAN.
In fact, you have been able to edit timecoded frame accurate video through a browser over the public internet as a professional for around six years and as a consumer for around five years. Over 1,000,000 hours of professionally shot source material has already been handled by such systems.
Last time I tried it, anyone on the ³ÉÈË¿ìÊÖ desktop inside the ³ÉÈË¿ìÊÖ firewall could use such tools - with neither installation of special software or configuration of their PCs - provided they had a standard web browser with the common standard plugins installed.
Today, Android users have access to similar technology on their tablets.
Your house rules prevent me from giving details, but I would welcome contact from you if you would like a demonstration. It could save you a few year's work - and help you avoid the blind alleys which can sometimes result when people with little practical experience of an existing working solution try to set a standard without being aware of the lessons already learnt elsewhere over many years.
Complain about this comment (Comment number 23)
Comment number 24.
At 28th Feb 2011, Kit Green wrote:23. At 18:44pm on 27th Feb 2011, sbstreater
-------------------------------------------------------------
I thought there was just a long line of failures over the last 15 years.... producer desktop, DMI etc etc have all struggled on but not delivered the functionality required for proper roll-out.
Complain about this comment (Comment number 24)
Comment number 25.
At 28th Feb 2011, Rob Coenen wrote:Hi sbstreater!
I'm aware of what you are pointing at, and it's not HTML. It's using the Java plug-in, which is just another plug-in like Flash or Silverscreen- and definitely not part of the HTML standard. As you probably know, the big guys like Apple and Microsoft are not so keen on this and have a policy of not installing these plug-ins into the webbrowsers anymore. For 'Android' the same logic applies: there are just applications running under the Android OS, or Apple's iOS for that matter- and not part of HTML.
Complain about this comment (Comment number 25)
Comment number 26.
At 28th Feb 2011, sbstreater wrote:Hi Rob Coenen!
As you have noticed, my company uses Java for its Cloud video platform. My research has shown that Java is currently the only widely available solution for a responsive frame accurate video editing system running over the internet - even including the mobile internet.
Apple and Microsoft are of course competitors to Java - because they benefit from trapping people into their own architectures, and Java is cross platform.
Microsoft does not decide whether Java is installed or not anymore than they decide which disks are installed - this question is decided by the PC manufacturer - so this comment show a bit of a misunderstanding of the situation. And the vast majority of Windows machines come with Windows installed (and it is free to add if you need to).
As it happens, all the Apple Macs I have used come with Java as standard, and this is currently Apple's policy I believe. As the hardware manufacturer, they decide what goes on their boxes.
The problem with trying to do real time video in HTML is that it is simply not appropriate. The over reliance on the remote server cripples performance, and the simplistic HTML being proposed does not allow tight enough integration between the high-CPU intensive video codecs and the other real time demands on the system from a video editing system.
Yes - Java is the obvious solution. It is a very common plug in, and can be installed for free in the rare case where it is not installed. Last time I looked, every ³ÉÈË¿ìÊÖ desktop and ITV desktop had Java installed, for example.
Complain about this comment (Comment number 26)
Comment number 27.
At 28th Feb 2011, Rob Coenen wrote:Hello sbstreater,
my research has shown that other plug-in (Flash and Silverlight) can do the same trick. But they are proprietary 3th party plug-ins. I do agree with you that 5 years ago it was probably not appropriate to use HTML for real time video- which is pretty much the reason why the Flash plug-in is installed on 99% of the internet-enabled computers. But the HTML5 community has been working hard to get HTML5 Video ready- and it does work now: just grab any of the latest nightly builds and see for yourself: HTML5 has real-time, frame-accurate, plugin-free video using just open standards.
Complain about this comment (Comment number 27)
Comment number 28.
At 1st Mar 2011, sbstreater wrote:Hi Rob Coenen!
I don't think anyone doubts the good work being done with open standards. Although it's a pity that HTML 5 supports the patented MPEG codecs - no more open than Java, wouldn't you say? - And that some major suppliers and browsers support only this patented video format.
As I see it there are two types of standards. The first type tells you that you can only do something in a particular way - a restrictive standard. You must put your video in this particular format, which "we" have designed and optimised to do something which may not be want you want to do with your video. It may be the wrong datarate, the wrong CPU requirements, or be good for streaming but bad for editing. Or good for server side, but bad for scaleable rich client services. Or wrong in ways "we" (or you) haven't even thought of. But that's just tough, because "we" are restricting what you can do with your video to what "we" understand, because in our arrogance, "we" can't imagine that we haven't thought of something and "we" think "we" know best.
The second type is an enabling standard. This says you can write any software you like - including to handle your own video format - and it will work on any standard conforming device. Java is such an enabling standard.
Rather than adding ideas one at a time to a standard like HTML - which takes years to ratify and roll out across the installed base - just adding one enabling standard like Java allows every standard conforming device to do everything. Of course, later improvements to the standard can make these easier and more efficient - but you are not restricted to applications people knew about when they wrote the standard.
I can run the latest version of my frame accurate video editing software on any browser even with ten year old versions of Java - and (with this piece of specially written software), I benefit from many technical features not even being considered for HTML. How long will it be before the installed base of HTML can incorporate these new ideas, let alone work on existing PCs built since 2000 without needing to upgrade them?
I think we can say the answer to that is Never. This constant churn of standards will always lag the innovation in the market.
The reality is that the video in HTML 5 only looks good because what was there before was so totally lacking. But it is a poor substitute for a real innovative platform like an efficient virtual machine like Java - which is the real advance HTML is looking for.
Complain about this comment (Comment number 28)