Friday, May 13, 2011

The problems with HTML5 <Audio>

I've been having a Twitter back and forth discussion with Giorgio Sardo, Microsoft's HTML5/IE evangelist on Audio, but I find 140 characters too limiting to explain the issues, and Giorgio seems more interested in snark and attacking Chrome, than attacking the root of the problem.


This all started because after the release of Angry Birds at Google I/O, people noticed that it was requesting Flash. Angry Birds is written in GWT and uses a GWT library written by Fred Sauer called GWT-voices. This library not only supports HTML5 audio, but has fallbacks to flash and even <bgsound> on IE6!

There was speculation the Flash requirement was done for nefarious purposes (to block iOS) or because Chrome includes Flash, but the reality is, it was done *both* because Chrome has some bugs, and HTML5 <audio> just isn't good for games or professional music applications.

I first noticed the shortcomings of the audio tag last year when we ported Quake2 to HTML5 (GwtQuake) shown at last years I/O, where I also demoed a Commodore 64 SID music emulator. There are two issues with using HTML5 Audio, which was originally designed to support applications like streaming music players.

It is missing functionality

The HTML5 audio element permits operations like seeking, looping, and volume control, which are great for jukebox applications, but you cannot synthesize sound on the fly, retrieve sound samples, process sound samples, apply environmental effects, or even do basic stereo panning. Quake2 required 3D sound based OpenAL's inverse distance damping method as well as stereo panning, I did my best, and implemented distance damping with the volume control, but had no ability to position sounds left or right.

For sound synthesis, there is no official way to play back dynamically created buffers. The workaround is to use Javascript to encode sample buffers into PCM or OGG in realtime, convert them to data URLs and use those as the source for an audio element, which is very computationally expensive and chews up browser memory. For developers wishing to create even basic music visualizers, it creates huge difficulties.

Problem #2: Latency

Audio applications require low latency. Studies have shown human beings can perceive audio latency down to the millisecond, but in general, lower than 7ms is considered good enough. This means in some circumstances, you need to schedule sounds within 7ms of one another, for example, if you need to simultaneously start two sounds, one on the left ear, and one on the right ear, or if you need to concatenate several sounds together in series.

Giorgio has a neat demo here of playing piano notes in sequence, and hats off to Microsoft for providing a great <audio> implementation. It's a cool demo, but I still hear latency variation in playback between notes and occasional glitches. No one's going to build something even 1/10th as good as Garage Band on iPad using this technique. That's because the one way you can schedule audio in HTML5 is via the browser's event-loop using setInterval or setTimeout, and that's problematic for several reasons.

First, it's unreliable. Over the years, setInterval/Timeout has been clamped to different minimal resolutions, depending on the browser and operating system. On some systems, it was tied to vertical refresh and would clamp to 16ms, then vendors started clamping to 10ms, and now they clamp as low as 4ms. But 4ms isn't a guarantee, it's a request. Many things can stand in the way of that request, for example, by just mousing over the page, user interface events can trigger Javascript handlers, CSS rules which force a relayout, and excessive Javascript work can trigger garbage collection.

Secondly, aggressive setInterval periods can delay response to user input, making the browser feel sluggish. If the user tabs to another window, the browser must decide whether or not to clamp timeouts to a much higher value (say 1 second), to avoid needlessly burning CPU which could harm background playback. Unlike requestAnimationFrame which solves this problem for graphics, there's no "requestSoundEvent".

Music Apps and Games sometime require playback of short-buffers

Some of the sounds in Quake2, for example, the hyper-blaster are sample buffers as small as 300 bytes. At 44khz, this is a hard deadline of 8ms to schedule the playback of the next sound in the sequence. With all of the other stuff going on within a frame, processing physics, AI, rendering, it is highly unlikely to be consistent, and do we really want JS performing this scheduling task.

Especially on mobile

Remember, mobile devices are HTML5 devices as well, and are continually getting better at HTML5, but they are much more resource constrained, and Javascript is even slower. Here, native scheduling is even more beneficial, and intensive Javascript scheduling of playback would be difficult, and waste battery.

That's why the Web Audio APIis important, because it permits complex audio schedule tasks, application of environmental effects, convolutions, etc to be natively accelerated without involving the Javascript engine in many cases. This takes pressure off the CPU, off of memory and the garbage collector, and makes timing overall more consistent. Here's a neat demo recently shown at Google I/O

Microsoft deserves credit

They made massive improvements in support of HTML5 from IE8 and IE9 especially in <canvas> and <audio>, and they deserve the right to feel proud and evangelize them. We celebrate that. It's why Angry Birds works, to some people's shock on other browsers, and it's not by accident. We built in fallbacks in our core library for 2d canvas, and tested on non-WebGL capable browsers like IE9 which have excellent GPU accelerated 2d support.

Angry Birds was not an attempt to make non-Chrome browsers look bad, but to make HTML5 look good, because when developers start realizing that professionally developed and polished games and applications can be done in HTML5, we all win.

But now is not the time to rest on our laurels. HTML5 is not done. There are many things incomplete and broken in the spec. I am sad to see Microsoft trying to talk down the experimentation that is going on in Firefox and Chrome, vis-a-vis WebGL and new Audio APIs, just because they are on a slower release cycle and do not have these bleeding edge features.

Giorgio seems to be suggesting in his tweets that the basic HTML5 <audio> tag is "good enough" and that the current IE9 implementation covers use cases sufficiently, and I disagree with that strongly.

We need 3d on the web. We need high quality, low latency, audio. We need to be able to do the things that OpenAL and DirectX can do with sound on the Web. And we're not going to get there by sticking our head in the sand and declaring premature victory.

Labels: , , , , ,


Blogger Nerd Progre said...

There is a reply for what you want to do: Java.

It´s kinda silly to attempt to force a web browser to do almost real-time apps with milliseconds-based reply times.

When will this madness end? HTML and browsers were not designed for this, and it´s a real waste of cpu cycles attempting to do apps on interpreted Javascript.


1:37 PM  
Blogger Oliver said...

This is why Flash will be a key player for a while, too. I like Flash/Flex and Actionscript and enjoy it, but it's still essentially a virtual machine, running inside another virtual machine (i.e. the browser).

HTML 5 and Javascript are "native" to browsers which makes it more appealing.
Is that why the tech field at large seems to want HTML 5 browsers to behave like an OS front-end at runtime?
Having a standard platform that all system agree to support seems to be the ultimate goal, right? A mobile runtime platform that does all the same things an OS front-end, but in a non-proprietary way?

None of this is meant to be argumentative. Just curiosities and ponderings.

1:45 PM  
Blogger Diznug said...

Couldn't agree more! I've tried to shoehorn audio into my own HTML5 game and met with a lot of frustration. Released it without audio rather than falling back on flash or a java applet.

For the curious:

2:27 PM  
Blogger Blabla Bla said...

How does everyone win when developers are forced to use a mish-mash of document authoring technologies to build applications?

Nerd Progre is right, Java is a much better answer since hey, it was meant to build APPS unlike HTML.

2:27 PM  
Blogger Unknown said...

There are some developments in cutting-edge versions of Chrome and Firefox. Check this out:

2:35 PM  
Blogger Lucian Armasu said...

When is Microsoft going to implement WebGL anyway? I haven't heard them mentioning it for IE10, so is it IE11? IE12?

2:49 PM  
Blogger Unknown said...

Random comment:

I've never tried this, but I believe that you should be able to use HTML5's postTask() function as a much faster setTimeout(..., 0).

IIRC, in Chrome at least, postTask is asynchronous. It was implemented this way on purpose because if we ever take Chrome to multi-process, it will have to be async, we don't want to break compat.

So you can post a message to your own window, and it should essentially be the same as directly appending to the event loop. The event loop should get processed very quickly, so you might be able to do very high resolution scheduling this way.

Something to tinker with perhaps.

3:48 PM  
Blogger Unknown said...

Aaron, the problem is that lots of other stuff is running in the main renderer thread: JavaScript, page rendering. So, it can take a significant amount of time (in realtime terms) for the event to be handled. Lots of timing jitter/ un-predictable latency is the result.

3:54 PM  
Blogger Unknown said...

Aaron, ok I now see you meant web-workers, which has no access to the DOM -

3:57 PM  
Blogger Giorgio said...

Hi Ray,
Thanks for sharing the background story of the game. It's interesting to understand how you approached the development of Angrybirds in HTML5 - I love the game!

While Twitter might have not been the best channel to have a conversation :), I think we have similar opinions. HTML5 audio is a great feature for web developers. In my blog post I shared some simple demo as well as more complex examples, such as a game or even a beatbox machine. For most scenarios, I believe HTML5 audio is a fantastic solution and I'm glad it’s part of the HTML5 specification and supported by all browsers today!

I also agree with you that in order to work properly, it's important browsers (or implementers in general) fine tune the performance aspect. In Internet Explorer 9 we did a lot of work to make sure that developers can get high quality and performance. Other browsers today have a great support, but there are still a few performance bugs that need to be addressed. It was great to hear the Chrome team at IO2011 confirming these bugs and announcing they are working to fix them soon. I’m sure other browsers are also addressing their issues and I’m looking forward to get more consistency in performance across the board.

That said, I believe there are many other features and capabilities that are not codified into any (HTML5, WebApps, CSS..) specification, yet.

Audio synthesis and manipulation is one of them. As a matter of fact, recently a new Audio Working Group has been charted at W3C to discuss this exact topic. A few “standalone” proposals have been demonstrated recently (as far as I know from Mozilla and from Google) and I’m looking forward to see how this conversation will evolve in the next months – together with other interesting conversations as well (for instance, speech APIs).

I encourage you to keep an eye on our HTML5 Labs site. We already have some interesting prototype and we’ll publish soon some new exciting lab:

-Giorgio Sardo | Microsoft Corp

4:41 PM  
Blogger Ray said...

With respect to Java, as someone who has been programming Java since HotJava alpha1, I think it's too late. Sun had a chance to revolutionize the web and they dropped the ball. They concentrated far too much on the server-side and enterprise, and only begrudgingly looked at the consumer side when it was too late (Java Kernel, new Java Plugin, JavaFX, etc).

Java could have been at parity with JavaScript in the browser, if they had tried to fix the impedance mismatch problems: Better integration with browser APIs and browser security model, much smaller footprint and faster startup (seriously, do we need CORBA ORBs in the classpath?).

With Sun gone, and Oracle suing anyone who tries to revamp Java to better fit today's web, it's unlikely to ever be tried again. Therefore, all we have left is JavaScript for cross-platform downloadable/mobile code.

So yes, fixing problems in Javascript, by building better APIs to expose to it, is a win for everyone.

Performance wise, Javascript these days is probably faster than Java was on a desktop machine 10 years ago, which may not be saying much, but performance 10 years ago wasn't half bad.

10:18 PM  
Blogger HTML Games said...

Just to say it is possible to improve html audio if your're willing to use a few hacks...

Take this html game for example (night rock) at - I had audio issues mainly on Chrome but fixed them by .load() audio method in javascript rather than trying to set the currentTime = 0 to reset the sound (even for very short firing sounds).

Also Chrome has extra audio issues when being run via a proxy - hence the timeout hacks in the code to loop the background music.

Hope this infomation helps someone.

12:39 AM  
Blogger Nerd Progre said...

"With respect to Java, as someone who has been programming Java since HotJava alpha1, I think it's too late. Sun had a chance to revolutionize the web and they dropped the ball."

For applets, maybe. Let´s see what JavaFX 2.0 becomes ... in a few weeks. Heh Heh.

But the point here is that developers are trying to push the envelope and trying to recreate traditional desktop apps in AJAX. That is silly, to say the least. You don´t want an audio editor that must upload files to the cloud for editing, then download the data back. That might work if you´re on a big symmetric pipe, but sucks if you have residential broadband with upstream speeds often crippled.

Back to Java, you say:

"They concentrated far too much on the server-side and enterprise, and only begrudgingly looked at the consumer side when it was too late (Java Kernel, new Java Plugin, JavaFX, etc)." has been a success, and something neeed for years. Plus, Java6u20+ onwards was a BIG improvement in speed and general bug fixing (can you say antialiased fonts, native splash screen, etc.

While you talk about "web apps", Java 6 saw great DESKTOP APPS coded in Java: Vuze / Azureus, jDownloader, NeverNote, jEdit, and many more...


8:22 AM  
Blogger Nerd Progre said...

(continues from previous comment)

"much smaller footprint and faster startup (seriously, do we need CORBA ORBs in the classpath?)."

Funny, you want Java to be smaller yet at the same time AJAX devs are making Javascript apps bigger and more complex all the time, with huge frameworks all tied with little pieces of string on top of a house of cards.

The potential for security snafus in AJAX apps and fog-computing is way higher than with a properly designed Java app.

"With Sun gone, and Oracle suing anyone who tries to revamp Java to better fit today's web, it's unlikely to ever be tried again."

You must be reading too much sites. Java SE is open source, and will continue to be. OpenJDK is a success and IBM has embraced it, too. Apple joined OpenJDK as well.

With Novell going down the drain and its Microsoft .Net clone "Mono"
now in limbo, the future looks better than ever for OpenJDK Java.

PS: One of the hidden gems about Java is Java web Start, the ability to download and run full desktop apps -crypto signed and user-authorized- with a single click. Apps like Bloom -Facebook photo uploader- are taking great advantage of it.

8:23 AM  
Blogger Unknown said...

@Nerd: I'd love for you to be right about Java, but I'm not holding my breath.

If you can convince game developers to actually ship on Java [applets fx webstart whatever], and that they won't lose lots of customers in the process, then we'll know you were right. So far the commercial game developers I've met who are currently shipping applets are trying desperately to get away from them.

9:49 AM  
Blogger Sindisil said...

@Nerd Progre (in response to your first comment): You're perpetrating the same crime against JavaScript of which Java has been (and still is) the victim - judging it by its earliest implimentations, ignoring current reality.

To whit - JavaScript is JITed in all modern browsers. It's not yet in the same league as the major JVMs, but the big three are making great strides - Mozilla, with their JägerMonkey/TraceMonkey combo, Microsoft with Chakra, and Google with V8.

Browser may not have been designed to run high performance multi-media apps like games, but they're sure as heck evolving that capability.

11:26 AM  
Blogger Sindisil said...

@jgw - While I too am skeptical of Java making a comeback on the Web (as much as I'd welcome it), it's not like Java hasn't successfully been used for shipping games.

Puppy Games struggled for years with their neo-retro games, but recently have seen success with Revenge of the Titans.

OddLabs has a fun RTS called Tribal Trouble.

Oh, there was also this little game I heard about recently - Mine-something? something-craft? Hmm ... it's not coming to me ... if I think of it, I'll post again. ;)

Granted, there are few games in Java that are actually shipping (and, to be fair, Minecraft is *technically* in Beta. Most profitable Beta I've ever seen!).

It's not necessarily anything unique about Java that *prevents* it from being so used, though.

11:36 AM  
Blogger blizzard said...

Did the audio data api:

Have too much latency to use in Firefox 4? People have been using it to build games and effects from JS pretty happily. I'm surprised that you didn't bring it up at all in your article given how much attention it's gotten.

( is pretty bad for the audio effects in games use case, tbh - that's why we have the audio data api!)

3:27 PM  
Blogger Movies Gallery 2011 said...

Thanks for sharing your info. I really appreciate your efforts and I will be waiting for your further write ups thanks once again.
Android app development| Android app developer|

10:59 PM  
Blogger Bill said...

Java is not so much a solution. The issue taken up with Flash is largely that it is not supported by iOS. Less publicized is the fact that Java isn't supported either. Not on as an applet on a web page, at any rate. Java runs in a VM just like Flash does, and neither exists on iOS; and both are still plugins for the browser. In this way, Java and Flash are in the same boat. If iOS successfully takes Flash off the web, Java goes with it. And unless the audio tag steps up in its game, then the web will sound like poo.

2:53 PM  
Blogger Lauraine said...

Thanks for sharing your info. I really appreciate your efforts and I will be waiting for your further write ups thanks once again.
html5 audio player| html5 audio player

11:01 PM  
Blogger asif said...

very nice thanks for sharing

hey friend see snow on google
Type “Let It Snow” on @Google If you click and drag you can wipe the snow away. It is great. source:

3:46 PM  
Anonymous Anonymous said...

Thank you for the thoughtful review. The main advantage of youtube html5 player seems to be for embedding rich media such as audio and video in modern browsers. Although, the structure elements seem to be useful. CSS3 seems to be headed in the right direction, leaving many possibilities for implementation and creativity,

3:29 AM  
Anonymous Anonymous said...

Thank you for your continuous upgrades and improvements,HTML5 to be more than just the spec that the W3C validator checks your document against. Part of html5 media playerinvolves standards for bolt on technology including web workers, CSS3, server sent events, etc. that are going to (I hope) revolutionize the web and do away with the need for 3rd party browser plugins

6:22 AM  
Blogger Nicolas said...

Some have stated that "HTML browsers are not designed for this" - Well no they are not which is why we have Flash...

My point is if you can create a plugin running through the browser then shouldn't tying it directly into the browser prove even faster..?

Wasn't the canvas tag supposed to respond to that problem?

Maybe we need something like a Flash type Tag..? I mean by this something that diverges away from the limitations of a language designed for markup in order to deliver/prioritize other benefits.

Flash was fine until Apple and Adobe had their lover's tiff and then all of a sudden...

8:49 PM  
Blogger jessica said...

I just stumbled upon your blog and wanted to say that I have really enjoyed reading your blog posts. Any way I’ll be subscribing to your feed and I hope you post again soon.web design for schools uk

3:02 AM  

Post a Comment

<< Home