The problems with HTML5 <Audio>
I've been having a Twitter back and forth discussion with Giorgio Sardo, Microsoft's HTML5/IE evangelist on Audio, but I find 140 characters too limiting to explain the issues, and Giorgio seems more interested in snark and attacking Chrome, than attacking the root of the problem.
This all started because after the release of Angry Birds at Google I/O, people noticed that it was requesting Flash. Angry Birds is written in GWT and uses a GWT library written by Fred Sauer called GWT-voices. This library not only supports HTML5 audio, but has fallbacks to flash and even <bgsound> on IE6!
There was speculation the Flash requirement was done for nefarious purposes (to block iOS) or because Chrome includes Flash, but the reality is, it was done *both* because Chrome has some bugs, and HTML5 <audio> just isn't good for games or professional music applications.
I first noticed the shortcomings of the audio tag last year when we ported Quake2 to HTML5 (GwtQuake) shown at last years I/O, where I also demoed a Commodore 64 SID music emulator. There are two issues with using HTML5 Audio, which was originally designed to support applications like streaming music players.
It is missing functionality
The HTML5 audio element permits operations like seeking, looping, and volume control, which are great for jukebox applications, but you cannot synthesize sound on the fly, retrieve sound samples, process sound samples, apply environmental effects, or even do basic stereo panning. Quake2 required 3D sound based OpenAL's inverse distance damping method as well as stereo panning, I did my best, and implemented distance damping with the volume control, but had no ability to position sounds left or right.
Problem #2: Latency
Audio applications require low latency. Studies have shown human beings can perceive audio latency down to the millisecond, but in general, lower than 7ms is considered good enough. This means in some circumstances, you need to schedule sounds within 7ms of one another, for example, if you need to simultaneously start two sounds, one on the left ear, and one on the right ear, or if you need to concatenate several sounds together in series.
Giorgio has a neat demo here of playing piano notes in sequence, and hats off to Microsoft for providing a great <audio> implementation. It's a cool demo, but I still hear latency variation in playback between notes and occasional glitches. No one's going to build something even 1/10th as good as Garage Band on iPad using this technique. That's because the one way you can schedule audio in HTML5 is via the browser's event-loop using setInterval or setTimeout, and that's problematic for several reasons.
Secondly, aggressive setInterval periods can delay response to user input, making the browser feel sluggish. If the user tabs to another window, the browser must decide whether or not to clamp timeouts to a much higher value (say 1 second), to avoid needlessly burning CPU which could harm background playback. Unlike requestAnimationFrame which solves this problem for graphics, there's no "requestSoundEvent".
Music Apps and Games sometime require playback of short-buffers
Some of the sounds in Quake2, for example, the hyper-blaster are sample buffers as small as 300 bytes. At 44khz, this is a hard deadline of 8ms to schedule the playback of the next sound in the sequence. With all of the other stuff going on within a frame, processing physics, AI, rendering, it is highly unlikely to be consistent, and do we really want JS performing this scheduling task.
Especially on mobile
Microsoft deserves credit
They made massive improvements in support of HTML5 from IE8 and IE9 especially in <canvas> and <audio>, and they deserve the right to feel proud and evangelize them. We celebrate that. It's why Angry Birds works, to some people's shock on other browsers, and it's not by accident. We built in fallbacks in our core library for 2d canvas, and tested on non-WebGL capable browsers like IE9 which have excellent GPU accelerated 2d support.
Angry Birds was not an attempt to make non-Chrome browsers look bad, but to make HTML5 look good, because when developers start realizing that professionally developed and polished games and applications can be done in HTML5, we all win.
But now is not the time to rest on our laurels. HTML5 is not done. There are many things incomplete and broken in the spec. I am sad to see Microsoft trying to talk down the experimentation that is going on in Firefox and Chrome, vis-a-vis WebGL and new Audio APIs, just because they are on a slower release cycle and do not have these bleeding edge features.
Giorgio seems to be suggesting in his tweets that the basic HTML5 <audio> tag is "good enough" and that the current IE9 implementation covers use cases sufficiently, and I disagree with that strongly.
We need 3d on the web. We need high quality, low latency, audio. We need to be able to do the things that OpenAL and DirectX can do with sound on the Web. And we're not going to get there by sticking our head in the sand and declaring premature victory.