More At Eleven

Xeni, the author of the “Wired” article about spatial sound, wrote me a nice letter this morning about yesterday’s post. She basically said I should go read Iosono’s website and my technical questions would be answered. I was a bit abashed at first. Here I was proclaiming the problems I thought I saw with this new technology and I hadn’t even read their website. (Hey, our president has fully admitted that he doesn’t watch the news or read the paper because he doesn’t want to be exposed to those lies and biases. Can I use the same excuse?)

So I read it. I still have questions. Just more of them.

The one thing I noticed was that their theater system supports all the standard sound formats. You can feed it Dolby Digital, DTS, SDDS—even stereo—and it’ll happily play it back. You won’t get its super-bonus positioning features but you will get its “every seat in the theater sounds as good as every other” feature. That’s certainly nice. I have my doubts that theater chains will be willing to fork over cash for that feature alone. “We gave them their stadium seating. What do they want from us, blood?” People care about good sound to a certain extent. The “sweet spot” in every chair might be too much to ask. But maybe I’m wrong.

The workings of the “spatial sound” part of this new Iosono system sounds like it is basically audio files plus metadata—the master track plus information about where to place it and move it and whatnot. That makes sense. Their website says that their workstation can take up to 64 sound files and place them or move them through the theater sound space.

I have to admit I’m still confused. What is their master sound format? Is there a master sound format? Is it simply an open-ended thing? Up to 64 tracks plus meta data and that’s it? No built-in hard speaker assignments? So let’s assume that it’s something like that. How do you turn it over for encoding? Eight 8-track hard drives off the Tascam MMR-8 recorder? A firewire drive from Pro Tools with all 64 tracks on it? Maybe most people don’t care about these things but this is the nitty-gritty tech stuff that I like to understand. Now after it’s encoded, what gets shipped to theaters with the prints?

When you’re dealing with a 5.1 master sound track it’s pretty simple—6 channels of audio. That easily fits onto a hard drive. Since many stages make use of MMR-8 recorders, the drive from that machine will usually be sent to the NT Audio or one of the other facilities around Los Angeles that will encode the soundtrack on to the film. Dolby shows up on the dub stage with their own encoding gear and they’ll generate a couple of MOs (Magneto-Optical Disks) with their Dolby-encoded master audio. These disks get shipped to the lab facility as well.

With a 5.1 master sound track, each channel of audio contains all the audio that is played from one speaker in a theater. Usually the layout is like this:

  1. Left
  2. Left Surround
  3. Center
  4. Right Surround
  5. Right
  6. Sub

That’s what I’m wondering about with my questions. How does that process work for the Iosono system?

You need to have at least the 5.1 covered with this new system so you can fill up the space with sound. Pretty much all the dialogue comes out the center channel along with some of the sound effects and foley. Most of the sound effects and music are in the left and right speakers. The surrounds are used for reverb returns on music to give it more presence, backgrounds to create the environment, and sound effects for movement (i.e. bullet bys past the camera into the surrounds). At a minimum you need to recreate that in Iosono. Everything else is bonus.

But here’s a problem that I see: predubbing. When the sound editors on a film show up on the stage for predubbing they have lots and lots of tracks of sound with them. This might be a typical breakdown:

  • Dialogue — 16 tracks
  • ADR — 24 to 32 tracks
  • Group ADR — 24 to 32 tracks
  • Foley (Footsteps and Props) — 32 tracks
  • Backgrounds — 96 tracks
  • Sound Effects — 32 to 200+ tracks

Sound Effects of course is the difficult one. If the film is a talkie, light romantic comedy, then you’re probably closer to the 32 tracks. If you’re dealing with an action movie you can easily go well beyond 200 tracks of effects. Foley could be similar. If you’re dealing with a sci-fi or a period piece with lots of objects that are not “standard” to our world there might be many, many more tracks of props.

Now these cut tracks need to be predubbed to manageable amounts for the final mix. We usually deal with 8-track predubs or at least think of them in groups of 8-tracks. So you might wind up with something like this:

  • Dialogue — 1 8-track predub
  • ADR — 1 8-track predub
  • Group ADR — 1 or 2 8-track predubs
  • Foley — 2 8-track predubs
  • Backgrounds — 4 8-track predubs
  • Sound Effects — 4 to 15 8-track predubs

So even on a light show you can be looking at 104 tracks of sound after predubbing—and we still need to add music in there. That’s more than the Iosono system can handle. You almost need to do a second predub to get that down to the 64 tracks.

It’s not an impossible workflow to manage but it would take more time. And that is one of the critical points from my previous post. How much is a studio willing to spend on this?

I don’t want anyone to misunderstand me on this—it sounds like a very cool system. I just wonder how it can fit it our existing time frame to accomplish our goals and will studios and theater chains be willing to shell out the cash for it?