(Michael Chinen) » AR/VR 3D Audio editors in 2024

03 Jan 24
06:45

AR/VR 3D Audio editors in 2024

I’m curious what the ideal 3D audio editing interface for casual editing/playback would be like in VR or AR. ‘3D audio editors’ might not be the right term here. There are a few companies including one I ran into recently called Sound Particles in Leiria, Portugal that produce professional audio editors for post-production, including cinematic rendering of battle scenes and the like where spatial audio is important, and capturing the 3d scene is key. 3D scene management (tracking the position of the camera and all of the entities) is the core of game engines and CGI.

I’m actually interested something else: audio editing in a VR (or AR) context, where you want to mix some tracks, or edit a podcast, or do typical things you’d do in a 2d audio editor like audacity, where scene and entity locations aren’t the primary concern. I wasn’t aware of this kind of editor, but I bet something exists, either in FOSS, commericial apps, and if not, definitely in academic research. So here’s what I found.

Before I dive in, here are my relatively naive thoughts on possible VR applications in the editor domain. I’ve developed a few audio editors, including being part of the audacity team (2008-2012) and making an audio editor on iOS called Voicer that I retired a few years ago. But I haven’t been very close to editor development for a while.

Spectral envelopes/3d spectrogram is a fairly obvious physical mapping to do, and kind of fun to look at as evinced by 90s music videos and winamp. However, most people that I know prefer waveform to spectrogram editing. At the end of the day the artefact being produced is a time-domain, and spectra are literally convoluted with respect to time, leaving the user to be uncertain about if any spectral edit would add an unintentional click or blurriness. Another way to explain this is because spectra are computed over time windows, and if we are plotting spectrograms in 3d, with one of the axes being time, there is ambiguity as to what editing a certain point should mean. Another issue is that there are numerically impossible power spectrograms because of the overlap in time, but there are no impossible waveforms, since the waveform is the ground truth.
Providing a waveform interface is still important. Being able to zoom and accurately select, move, cut, and apply effects to a waveform is the core of an audio editor. The waveform provides quite a bit of information: it’s easy to tell if there is a tonal or noisy component present when zoomed in, and the RMS information at zoomed-out scales gives a reasonable amount of info about where the events lie in time. 3D elements might be used to add more space for large numbers of tracks, or possibly positioning the stereo or ambisonic location.
It’s now obligatory to mention why AI won’t make any new tool obsolete before getting started. So why not make a 0-D audio editor that is just a text box telling the AI what to edit? If it worked well enough, that would capture some percentage of the market (e.g. removing the pauses from a recording is already popular use case). Generative audio will become more useful for audio creators too. But it’s still a while before we capture what human audio editors do. There is a lot of audio data, but little collected data about the creative process of editing audio. Editing is also a highly interactive process with necessary trial and error, where the trials and errors are useful for aesthetic and deeper understanding of the objects behind the audio that reveal the next step to the editor. I think as long as humans want to be creative, we will need audio editors.
Audio editing capability has been stagnant until recently. Although I worked on Audacity, I was always rooting for something better to come around. In fact, one of the reasons I worked on it was because it had obvious issues that could be resolved (multithreading, hardware interface UI). Sound Forge was my favorite audio editor in the early 2000s. When I talked to sound engineers, they mostly wanted something that was fast, accurate, and reliable, with some preferring support of certain plugins. They don’t need a DAW for everything, but everything basically turned into a DAW. The basic linear interface wasn’t really improved on, just more support for tracks or inputs was added. This could mean that innovation in interface is highly constrained because what we have today gets us there eventually without having to relearn or encounter problems with a new interface. Because of this, I would consider VR editors better suited as a hobbyist or research project than a business venture.

Here’s what I found:

Immersive VisualAudioDesign: Spectral Editing in VR (Proceedings of the Audio Mostly 2018 on Sound in Immersion and Emotion): A university-driven research project that made a nice analogy of morning (low azimuth) shadows from mountains to spectral masking. Also cool that paper prototyping for academics is a thing. I remember it catching on for game dev and mobile app design in the late ’00s and thought it would make more sense in other areas as well. It works for spectrograms because they are a linear physical to physical mapping. This project seems like it wasn’t developed further though.
There are a few synth/patch programming VR projects (one called SoundStage from at least 2016), but I don’t consider these audio editors. Music production/education is an interesting area to look into as well, and probably a lot of fun.
Almost all VR audio editor searches return results on how to create audio for VR, which is what I expected. There might not be a lot of demand. OTOH, I feel like people on the meta quest really like novel utilities.
The Sound Particles interface is clearly designed for a 2d/DAW scene-entity paradigm, which I said wasn’t the focus of the post, but it’s actually the closest example I could find to something that you could drop into VR, since it renders the audio scene visually in 3D.

So I didn’t do a lot of research, but I promised myself to do enough to make a hot take post in a day. I feel like there isn’t much, probably due to lack of demand and a history of audio editing progress being relatively incremental, but that also means it’s possible to do something fun in the space, even if being useful still seems like it would take some time to dial in. So there you go. Please let me know if you know of any interesting editors, applications, or research in the area.