09 Jan 24
02:52

Art and entertainment for AI entities

What kind of art or entertainment would hypothetical artificial intelligence entities of the future find interesting? Would they want to make a certain type of art? Would they appreciate human art? Would they appreciate art made by humans explicitly for them?

To be clear, this post is not about the art that Generative AI makes for humans, but rather about art made specifically for AI entities. But we can start with art made for humans as a way of getting there, since it’s not obvious at all what art made for AI would look like.

I want to avoid spending a lot of time defining art and entertainment, and especially avoid drawing a line between the two. So the rest of the post uses the term art to capture the highest possible art with the lowest of brows, and refers to something that is not directly needed for survival or well-being that is interesting to look at. Consider an AI agent that is not omnipotent, but is continuously online. What do they do with their idle time when there is no job, or they are waiting on a result? Today’s computers poll and wait for the next task, which is a possible outcome that does preserve energy, so the same behavior might be programmed in. But what if you don’t have that constraint for AI agents, and let them decide whether to sleep(0) or do something else? Do the agents do only purely useful research, or do they spend some time on something resembling what we call art or entertainment?

A great deal of human art focuses on a few ideas closely related to our evolutionary pressures.
* Sex
* Love
* Becoming Successful and/or Rich
* Defeating Enemies
* Overcoming Disasters
* Solving Mysteries
* Scary things
It’s possible that AI entities will want to learn more about humans, the same way we want to know more about our origins. However, this could also be something that is more interesting as a non-art discipline, like anthropology and history. However, art does have a way of capturing and transferring experience and perspective in a way that the humanities cannot.

One interesting thing about (non-vocal) music is that it doesn’t tie directly into language, and is therefore harder to connect with the concepts of evolutionary pressures. For example, it would be hard to communicate a love story using only an orchestra. But yet, music is as old as time, and even when times were tough, humans made time to create and listen to music. Music begs for an explanation in this way. Sexual selection is one explanation (e.g. birdsong), evolutionary cheesecake (à la Steven Pinker) is another. Both are fascinating to think about. If music is cheesecake, cheesecake is very interesting and so deep of a reflection of humans we not only consume it but dedicate academic study to it. But what is ‘evolutionary cheesecake’ for AI entities, and what does sexual selection correspond to for AI entities?

For ‘typical’ AI entities, sexual selection is not going to be directly a part of their fitness function, though connection and understanding other entities might be. AI appealing to human sexual desire seems like the oldest sci-fi trope there is, and remains one of the most viable means of hacking humans that we are sure to see more of. A few early AI safety folks including Nick Bostrom strongly warned against letting robots resemble humans. It turns out that we don’t even need the physical robot in the ex-machina movie, and not even the rendered visuals like Replika is trying to do. People on reddit’s r/ChatGPT subreddit are falling in love with ChatGPT’s live voice chat, and have even noted that the ‘Sky’ voice sounds like Scarlett Johansson in Her. And this is only using pure text to speech output and ASR input, without processing prosody at all. This is what the people want, and at least while AIs are trained by humans, it’s likely that it will be an implicit or explicit part of their training function. If this is an area that might produce art or entertainment I wouldn’t be surprised if it was more like Office Space than Ex Machina.

With visual art, including dance, connecting to evolutionary concepts is not difficult. Some visual art explicitly tries to avoid language or ‘real-world references’ via abstraction, or dissolution of the meaning. The latter is quite interesting and appears in music as well. Music made with certain non-instrumental sounds have obvious real-world references, such as a footstep or a coin settling on the table after a flip. Pierre Schaeffer‘s musique concrète devised ways of presenting these sounds in a way that obfuscated or removed the association with the real world object, leaving the divorced sound object to be used for abstract creative purposes.

At the extreme end of entertainment, you get wireheading: pure pleasure stimulation. Most people seem to equate wireheading with extreme drug abuse and think that it is not interesting or meaningful. But there are some possibly edgelord rationalist-adjacent people who would press a button to turn the entire universe into wireheading to solve the problem of suffering. Fortunately for those that don’t agree, there is no such button, but the concept is more plausible with programmable AI. What is the ‘pleasure center’ for an AI entity? It doesn’t seem like minimizing a loss function is particularily pleasurable, and at inference time, there is usually no loss function being minimized. But there are certainly networks that can be aware of their loss at inference time and even use that iteratively to improve a result. Few would argue that this is pleasure. It’s entirely possible that pleasure and pain, which is the topic of so many human stories, is entirely foreign to AI.

One way to look at art made by abstraction is that it has the potential to be more general, or at least, less connected to the world that the artist lived in (though you could certainly argue the opposite). I bring it up because it’s possible that the art AI creates or appreciates is less connected to the human world, and would seem more ‘abstract’ to us. This is a bit of a sci-fi trope to be perfectly honest, but it’s plausible if we are reducing the world of possible art to abstract or non-abstract art. If we do this simple reduction for a thought experiment, there are a few interesting outcomes:

  • art for AI is abstract, and humans can appreciate the abstractions
  • art for AI is not abstract (e.g. based on the qualia native to AI), but humans only can appreciate it as abstract
  • art for AI is abstract, but humans interpret the abstractions differently (e.g. art about the human/AI relationship)

Some AI agent would probably be interested in some art we could only appreciate in the abstract.

A few priming questions for the creation of art for AI entities:

  • What is the equivalent of a mirror for an AI entity, supposing the AI exists as weights and matrices that process text/video/audio as input and output?
  • How would interactive art or video games for AI look? They don’t have the same visual or physical limitations that humans do. To keep it minimal, what would a text-based game look like if built for AI?
  • How is human art related to our evolutionary selection function? How would art for AI be related to its loss function? What would the ‘loss function’ look like for an AI that can appreciate art (without directly programming an art appreciation into it)?
  • What is meaning, beauty, and ugliness for an AI?
  • What will AI struggle to understand?
  • What is boring for AI?
  • Structural complexity and the beauty of some math proofs are discussed with an aesthetic similar to art.
  • Consider all human artifacts as sonification and visualization of a part of the world, or as a non-linear projection of the world. Which of these artifacts constitute art or entertainment to you? What are the important properties of the mapping function and the artefact that makes it so?

03 Jan 24
06:45

AR/VR 3D Audio editors in 2024

I’m curious what the ideal 3D audio editing interface for casual editing/playback would be like in VR or AR. ‘3D audio editors’ might not be the right term here. There are a few companies including one I ran into recently called Sound Particles in Leiria, Portugal that produce professional audio editors for post-production, including cinematic rendering of battle scenes and the like where spatial audio is important, and capturing the 3d scene is key. 3D scene management (tracking the position of the camera and all of the entities) is the core of game engines and CGI.

I’m actually interested something else: audio editing in a VR (or AR) context, where you want to mix some tracks, or edit a podcast, or do typical things you’d do in a 2d audio editor like audacity, where scene and entity locations aren’t the primary concern. I wasn’t aware of this kind of editor, but I bet something exists, either in FOSS, commericial apps, and if not, definitely in academic research. So here’s what I found.

Before I dive in, here are my relatively naive thoughts on possible VR applications in the editor domain. I’ve developed a few audio editors, including being part of the audacity team (2008-2012) and making an audio editor on iOS called Voicer that I retired a few years ago. But I haven’t been very close to editor development for a while.

  • Spectral envelopes/3d spectrogram is a fairly obvious physical mapping to do, and kind of fun to look at as evinced by 90s music videos and winamp. However, most people that I know prefer waveform to spectrogram editing. At the end of the day the artefact being produced is a time-domain, and spectra are literally convoluted with respect to time, leaving the user to be uncertain about if any spectral edit would add an unintentional click or blurriness. Another way to explain this is because spectra are computed over time windows, and if we are plotting spectrograms in 3d, with one of the axes being time, there is ambiguity as to what editing a certain point should mean. Another issue is that there are numerically impossible power spectrograms because of the overlap in time, but there are no impossible waveforms, since the waveform is the ground truth.
  • Providing a waveform interface is still important. Being able to zoom and accurately select, move, cut, and apply effects to a waveform is the core of an audio editor. The waveform provides quite a bit of information: it’s easy to tell if there is a tonal or noisy component present when zoomed in, and the RMS information at zoomed-out scales gives a reasonable amount of info about where the events lie in time. 3D elements might be used to add more space for large numbers of tracks, or possibly positioning the stereo or ambisonic location.
  • It’s now obligatory to mention why AI won’t make any new tool obsolete before getting started. So why not make a 0-D audio editor that is just a text box telling the AI what to edit? If it worked well enough, that would capture some percentage of the market (e.g. removing the pauses from a recording is already popular use case). Generative audio will become more useful for audio creators too. But it’s still a while before we capture what human audio editors do. There is a lot of audio data, but little collected data about the creative process of editing audio. Editing is also a highly interactive process with necessary trial and error, where the trials and errors are useful for aesthetic and deeper understanding of the objects behind the audio that reveal the next step to the editor. I think as long as humans want to be creative, we will need audio editors.
  • Audio editing capability has been stagnant until recently. Although I worked on Audacity, I was always rooting for something better to come around. In fact, one of the reasons I worked on it was because it had obvious issues that could be resolved (multithreading, hardware interface UI). Sound Forge was my favorite audio editor in the early 2000s. When I talked to sound engineers, they mostly wanted something that was fast, accurate, and reliable, with some preferring support of certain plugins. They don’t need a DAW for everything, but everything basically turned into a DAW. The basic linear interface wasn’t really improved on, just more support for tracks or inputs was added. This could mean that innovation in interface is highly constrained because what we have today gets us there eventually without having to relearn or encounter problems with a new interface. Because of this, I would consider VR editors better suited as a hobbyist or research project than a business venture.

Here’s what I found:

  • Immersive VisualAudioDesign: Spectral Editing in VR (Proceedings of the Audio Mostly 2018 on Sound in Immersion and Emotion): A university-driven research project that made a nice analogy of morning (low azimuth) shadows from mountains to spectral masking. Also cool that paper prototyping for academics is a thing. I remember it catching on for game dev and mobile app design in the late ’00s and thought it would make more sense in other areas as well. It works for spectrograms because they are a linear physical to physical mapping. This project seems like it wasn’t developed further though.

  • There are a few synth/patch programming VR projects (one called SoundStage from at least 2016), but I don’t consider these audio editors. Music production/education is an interesting area to look into as well, and probably a lot of fun.
  • Almost all VR audio editor searches return results on how to create audio for VR, which is what I expected. There might not be a lot of demand. OTOH, I feel like people on the meta quest really like novel utilities.
  • The Sound Particles interface is clearly designed for a 2d/DAW scene-entity paradigm, which I said wasn’t the focus of the post, but it’s actually the closest example I could find to something that you could drop into VR, since it renders the audio scene visually in 3D.

So I didn’t do a lot of research, but I promised myself to do enough to make a hot take post in a day. I feel like there isn’t much, probably due to lack of demand and a history of audio editing progress being relatively incremental, but that also means it’s possible to do something fun in the space, even if being useful still seems like it would take some time to dial in. So there you go. Please let me know if you know of any interesting editors, applications, or research in the area.