01 Mar 24
01:05

The joy of the {game, simulation, physics} engine, and on implicit engines in large ML models

Writing a game with an engine is a special experience the first few times. At the core of the engine is some kind of loop. The loop is usually over a fixed time interval, but it can also be per event. The frequency of the loop doesn’t have to match the screen rendering frequency. It can be faster, since the game engine tracks the states of all of entities and their interactions.

The interactions between entities include collision detection and defining what happens when entities collide (hit points reduced, force applied, etc). From these relatively simple rules, a complex system arises, and at some point you can’t predict what would happen for a given set of inputs, and have to observe the engine in action to understand it. After defining the rules and a few entities (like pong paddles, the ball, and the walls), there is another step where you observe the system living and breathing on the screen. And it is undoubtedly different somehow than you imagined it. There is a certain enjoyment I got from writing game engines when you realize that so many different combinations of things might happen as a result of the loop’s heartbeat, and it starts to feel alive. I imagine most coders feel similarly.

There are also rules behind an engine that aren’t explicitly concerned with interactions between two entities, such as gravity or how often some entity spawns, or how the sun rises and sets. These add additional complexity.

The rendering pipeline can be seen as something that allows visualizing the game state, but typically will not affect it. You could render it to video, audio, or text, from various views if in a coordinate based engine.

What is the difference between a game engine and a simulation? With a simulation, typically you take something from the real world, and try to capture the phenomenon with rules and simplification (i.e. we don’t simulate most things at the sub-atomic level). With a game, you can be god, and create your own rules, and your own world. I mentioned earlier that eventually the engine produces more complex output than can be predicted. This feels like leverage – you can understand the individual rules, but not the output of the system. Without the rules, it would be very hard to create the output.

As I mentioned earlier, because simple rules in games can create complex output, it’s often very difficult or impossible to create the rules correctly for a desired output state. So there is usually a feedback loop as the developer observes, and not only corrects rules, but gets new ideas for what would be interesting. I’d guess it’s more intuitive for a game designer to start by going for a certain class of output states, but there are many games that are designed around a novel set of rules. For example, being able to reverse time in Braid.

With a simulation, the developer often needs to create many rules to match some abstraction of reality. An apt intersection of game development and simulation is the 3D game physics engine, since it has many constraints that we expect from our real world – like large masses needing more force to accelerate.

Lately, a number of posts, driven by the SoRA release have called into question to what extent these large machine learning models have a physics engine running inside them. You might recall earlier video generation tools like Google’s Imagen or Meta’s Make-A-Video.

These clearly have a lack of physics understanding and have the wiggly inconsistency that suggests the models aren’t quite sure about how objects should behave in the scene. Compare to SORA, which seems to mostly capture the scene as a game engine would render it, down to the reflections, with only a few artifacts that are observable on finer introspection. Does this model have something more like a physics engine built in? The answer seems like ‘sort of’ but not quite yet? I forward you to these discussions if you are interested by Gary Marcus and Raphaël Millière

To me, the more interesting question is about how generally an ML model can represent the data via rules as some kind of engine, and how this engine relates to the probabilistic output layer. Could an ML model reliably construct new rules? If you tell it to imagine gravity exists between carbon atoms only, for example. The last question, how can you make models relatively more reliable seems to be answered to some extent by SORA, since it got rid of the wiggle and captured scenes more reliably. Having consistency over longer and longer frames of time seems crucial for anything like a physics or game engine.

Most of the video data we have is of 3D world, since humans went out into the real world and captured it with a real camera. This translates super nicely to matrix multiplication, as any 3D programmer will tell you. I think the 2D game engine may actually be more interesting here, because it is divorced from this data, and usually comes from the game designer’s brain directly. If these models can capture something like this well and consistently, there should be a lot of interesting results we would see. Even better if we have interpretability/explainability built-in, so the rules within the models can be explicitly described and checked.

16 Feb 24
09:42

Trying Out Stable Cascade Local Image Generation

Stability AI released Stable Cascade on GitHub this week. It’s very open, and allows not only inference on a number of tasks from text prompting to in-painting but also allows training and fine tuning. It’s a three stage diffusion model, and they also provide pretrained weights you can download.

Here’s what I needed to do to get set up on Linux to play with inference. I’m running mint-xfce but it probably works the same or similar way on Ubuntu and other flavors. Here’s an example of what I generated:

Anime girl finding funnel-shaped chanterelles (no gills) on moss overlooking lisbon orange roofs from monsanto park
Prompt: Anime girl finding funnel-shaped chanterelles (no gills) on moss overlooking Lisbon orange roofs from Monsanto park

Prerequisites

  • Python, Jupyter
  • Probably at least 10 GB VRAM like an RTX 3080, but as this is written I used 20 GB of the 24GB on a 3090 (I have 3090)

Steps

The github instructions are not super clear yet, so here’s all the steps I took.

  1. Clone the github
    git clone https://github.com/Stability-AI/StableCascade.git; cd StableCascade
  2. Do whatever python environment thing you prefer to install the requirements or just run:
    pip install -r requirements.txt
  3. Download the weights. If you download other variants, the default script won’t work (you can change the model .yaml files if needed, more on that later)
    cd models; ./download_models.sh essential big-big bfloat16
  1. Run the jupyter notebook
    cd ..; jupyter notebook
  2. [Maybe optional, but I needed it] Reduce batch size to 2 from 4
    – Edit configs/inference/stage_b_3b.yaml and change batch_size to 2
    – Also search for batch_size = 4 in the notebook and change it to 2
    (Just doing one of these two changes causes shape errors)

  1. Change the prompt from the ‘anthropomorphic nerdy rodent’ to the image of your dreams
  2. Open the jupyter notebook from the CLI output in your browser if needed. Then run the script (Click the >> button to run all or shift-enter or > for individual blocks)

Other models

You can also download other models by reading the readme in the /models directory and editing or copying the .yaml files at configs/inference/stage_b_3b.yaml and configs/inference/stage_c_3b.yaml to use the other model files (lite/small or non bfloat16). You could probably run on smaller GPUs with this method

Speed and Time

It takes about 20-30 seconds to generate two images at 1024×1024 with the big-big bfloat16 model.

It takes ~4 minutes to run the setup before generation can happen

Downloading the weights took ~30 minutes on a 500 megabit connection (about 10 GiB).

Importantly, it’s faster than the previous models. I would assume this is due to a more aggressive diffusion scheduling and getting the multiple stages of models just right (you can also consider stages as a part of the scheduling). There are 20 timesteps in the smallest (initial) model, and 10 timesteps in the second by default. These hyperparameters and architectural decisions seemed to me like they would be difficult when I first heard about diffusion models, and it makes sense they can squeeze more out of it here. I’m sure they’ve done other enhancements as well, but I haven’t even read the paper yet.

https://raw.githubusercontent.com/Stability-AI/StableCascade/master/figures/model-overview.jpg

From the github description though, it’s interesting to note the first stage is actually a VAE generator that works in the reduced latent space. This is different but related to how speech models are tending towards using a sampled generator that is like a langauge model (AudioLM) to generate latents that a decoder can ‘upsample’ into the final speech audio. One of the reasons for this is that the latent space for the autoencoder in SoundStream, while reduced in dimensionality, does not use the space as efficiently as it could for speech – for example, a random sequence in the latent space is not human babble, but probably closer to noise. The generator solves this problem by learning meaningful sequences in the space.

Quality

Stable Cascade is definitely better than the previous stability AI models like Stable Diffusion XL, and the models from the free offerings they had on their website.

The model is less steerable than DALL-E 3 (although it’s hard to get DALL-E to do something exactly, you can usually get it to have most of the features you want). It’s hard to get it to draw several things at once – for example “Japanese american 150 lbs 5’11” programming on a laptop with a view of orange lisbon rooftops in the background” often only yields the Japanese American, and occasionally the laptop and the orange rooftop. In some cases, the image is noticeably distorted. There is a reason they typically request anime style and a single item in the demos. But it’s a step up from other off the shelf tools, and feels only slightly worse than DALL-E 3, which I pay ~5-10 cents per image for only slightly better images. It also has a lot of extra functionality that can be built upon, so I’m very happy to have this tool. I’d guess it handles some subjects well and others not so much. The text handling is however, significantly worse than DALL-E 3. I haven’t examined how much it censors if at all, but that’s one of the limitations that sometimes cripple reasonable requests for ChatGPT+DALL-E.

Prompt: programming python dual screen 150lb Japanese American male 5’11 42 years old in Lisbon apartment with orange roof gelled side part black hair

Effort

Because of the errors it took about two to four hours of tinkering to get it all working. This is longer than anyone would want, and I did other things while waiting for downloads, etc, and doing this, but is typical for me with research githubs, and part of the reason why I am documenting it. In general I liked that the github had the scripts and weights prepared, and none of the errors seemed like total blockers, especially with the issues tab on github if I really got stuck. I hope we see more repos like this from other Open companies ;).

Extra notes

  • You can shift+right click to access the native browser menu to save the image output in the jupyter cells.
  • There are many other demos in the ipynbs.

Appendix

Errors

Default settings allowed me to generate one of the 4 batch-sized models with the big-big model, and then all of the other 20 times I tried, it OOM’d when doing the final A stage. The only way I could work around this was to set the batch size smaller, from 4 to 2. You can edit the line in the ipynb notebook, but you have to also do it in the .yaml config file, or you’ll get some error about shapes.

OutOfMemoryError: CUDA out of memory. Tried to allocate 1.50 GiB. GPU 0 has a total capacty of 23.69 GiB of which 1022.19 MiB is free. Process 4274 has 954.00 MiB memory in use. Including non-PyTorch memory, this process has 18.79 GiB memory in use. Of the allocated memory 14.91 GiB is allocated by PyTorch, and 2.44 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Redacted Bash history (in case I missed something)

1994 git clone https://github.com/Stability-AI/StableCascade.git
1995 ls
1996 cd StableCascade/
1997 ls
1998 cd inference/
1999 ls
2005 tree ..
2006 pip install requirements.txt
2007 pip install -r requirements.txt
2008 cd ..
2009 pip install -r requirements.txt
2010 ls
2011 cd inference/
2012 ls
2013 emacs readme.md
2020 cd ..
2021 ls
2022 cd models/
2023 ls
2024 emacs download_models.sh
2025 ./download_models.sh essential big-big bfloat16
2040 ./download_models.sh essential small-small bfloat16

09 Jan 24
02:52

Art and entertainment for AI entities

What kind of art or entertainment would hypothetical artificial intelligence entities of the future find interesting? Would they want to make a certain type of art? Would they appreciate human art? Would they appreciate art made by humans explicitly for them?

To be clear, this post is not about the art that Generative AI makes for humans, but rather about art made specifically for AI entities. But we can start with art made for humans as a way of getting there, since it’s not obvious at all what art made for AI would look like.

I want to avoid spending a lot of time defining art and entertainment, and especially avoid drawing a line between the two. So the rest of the post uses the term art to capture the highest possible art with the lowest of brows, and refers to something that is not directly needed for survival or well-being that is interesting to look at. Consider an AI agent that is not omnipotent, but is continuously online. What do they do with their idle time when there is no job, or they are waiting on a result? Today’s computers poll and wait for the next task, which is a possible outcome that does preserve energy, so the same behavior might be programmed in. But what if you don’t have that constraint for AI agents, and let them decide whether to sleep(0) or do something else? Do the agents do only purely useful research, or do they spend some time on something resembling what we call art or entertainment?

A great deal of human art focuses on a few ideas closely related to our evolutionary pressures.
* Sex
* Love
* Becoming Successful and/or Rich
* Defeating Enemies
* Overcoming Disasters
* Solving Mysteries
* Scary things
It’s possible that AI entities will want to learn more about humans, the same way we want to know more about our origins. However, this could also be something that is more interesting as a non-art discipline, like anthropology and history. However, art does have a way of capturing and transferring experience and perspective in a way that the humanities cannot.

One interesting thing about (non-vocal) music is that it doesn’t tie directly into language, and is therefore harder to connect with the concepts of evolutionary pressures. For example, it would be hard to communicate a love story using only an orchestra. But yet, music is as old as time, and even when times were tough, humans made time to create and listen to music. Music begs for an explanation in this way. Sexual selection is one explanation (e.g. birdsong), evolutionary cheesecake (à la Steven Pinker) is another. Both are fascinating to think about. If music is cheesecake, cheesecake is very interesting and so deep of a reflection of humans we not only consume it but dedicate academic study to it. But what is ‘evolutionary cheesecake’ for AI entities, and what does sexual selection correspond to for AI entities?

For ‘typical’ AI entities, sexual selection is not going to be directly a part of their fitness function, though connection and understanding other entities might be. AI appealing to human sexual desire seems like the oldest sci-fi trope there is, and remains one of the most viable means of hacking humans that we are sure to see more of. A few early AI safety folks including Nick Bostrom strongly warned against letting robots resemble humans. It turns out that we don’t even need the physical robot in the ex-machina movie, and not even the rendered visuals like Replika is trying to do. People on reddit’s r/ChatGPT subreddit are falling in love with ChatGPT’s live voice chat, and have even noted that the ‘Sky’ voice sounds like Scarlett Johansson in Her. And this is only using pure text to speech output and ASR input, without processing prosody at all. This is what the people want, and at least while AIs are trained by humans, it’s likely that it will be an implicit or explicit part of their training function. If this is an area that might produce art or entertainment I wouldn’t be surprised if it was more like Office Space than Ex Machina.

With visual art, including dance, connecting to evolutionary concepts is not difficult. Some visual art explicitly tries to avoid language or ‘real-world references’ via abstraction, or dissolution of the meaning. The latter is quite interesting and appears in music as well. Music made with certain non-instrumental sounds have obvious real-world references, such as a footstep or a coin settling on the table after a flip. Pierre Schaeffer‘s musique concrète devised ways of presenting these sounds in a way that obfuscated or removed the association with the real world object, leaving the divorced sound object to be used for abstract creative purposes.

At the extreme end of entertainment, you get wireheading: pure pleasure stimulation. Most people seem to equate wireheading with extreme drug abuse and think that it is not interesting or meaningful. But there are some possibly edgelord rationalist-adjacent people who would press a button to turn the entire universe into wireheading to solve the problem of suffering. Fortunately for those that don’t agree, there is no such button, but the concept is more plausible with programmable AI. What is the ‘pleasure center’ for an AI entity? It doesn’t seem like minimizing a loss function is particularily pleasurable, and at inference time, there is usually no loss function being minimized. But there are certainly networks that can be aware of their loss at inference time and even use that iteratively to improve a result. Few would argue that this is pleasure. It’s entirely possible that pleasure and pain, which is the topic of so many human stories, is entirely foreign to AI.

One way to look at art made by abstraction is that it has the potential to be more general, or at least, less connected to the world that the artist lived in (though you could certainly argue the opposite). I bring it up because it’s possible that the art AI creates or appreciates is less connected to the human world, and would seem more ‘abstract’ to us. This is a bit of a sci-fi trope to be perfectly honest, but it’s plausible if we are reducing the world of possible art to abstract or non-abstract art. If we do this simple reduction for a thought experiment, there are a few interesting outcomes:

  • art for AI is abstract, and humans can appreciate the abstractions
  • art for AI is not abstract (e.g. based on the qualia native to AI), but humans only can appreciate it as abstract
  • art for AI is abstract, but humans interpret the abstractions differently (e.g. art about the human/AI relationship)

Some AI agent would probably be interested in some art we could only appreciate in the abstract.

A few priming questions for the creation of art for AI entities:

  • What is the equivalent of a mirror for an AI entity, supposing the AI exists as weights and matrices that process text/video/audio as input and output?
  • How would interactive art or video games for AI look? They don’t have the same visual or physical limitations that humans do. To keep it minimal, what would a text-based game look like if built for AI?
  • How is human art related to our evolutionary selection function? How would art for AI be related to its loss function? What would the ‘loss function’ look like for an AI that can appreciate art (without directly programming an art appreciation into it)?
  • What is meaning, beauty, and ugliness for an AI?
  • What will AI struggle to understand?
  • What is boring for AI?
  • Structural complexity and the beauty of some math proofs are discussed with an aesthetic similar to art.
  • Consider all human artifacts as sonification and visualization of a part of the world, or as a non-linear projection of the world. Which of these artifacts constitute art or entertainment to you? What are the important properties of the mapping function and the artefact that makes it so?

03 Jan 24
06:45

AR/VR 3D Audio editors in 2024

I’m curious what the ideal 3D audio editing interface for casual editing/playback would be like in VR or AR. ‘3D audio editors’ might not be the right term here. There are a few companies including one I ran into recently called Sound Particles in Leiria, Portugal that produce professional audio editors for post-production, including cinematic rendering of battle scenes and the like where spatial audio is important, and capturing the 3d scene is key. 3D scene management (tracking the position of the camera and all of the entities) is the core of game engines and CGI.

I’m actually interested something else: audio editing in a VR (or AR) context, where you want to mix some tracks, or edit a podcast, or do typical things you’d do in a 2d audio editor like audacity, where scene and entity locations aren’t the primary concern. I wasn’t aware of this kind of editor, but I bet something exists, either in FOSS, commericial apps, and if not, definitely in academic research. So here’s what I found.

Before I dive in, here are my relatively naive thoughts on possible VR applications in the editor domain. I’ve developed a few audio editors, including being part of the audacity team (2008-2012) and making an audio editor on iOS called Voicer that I retired a few years ago. But I haven’t been very close to editor development for a while.

  • Spectral envelopes/3d spectrogram is a fairly obvious physical mapping to do, and kind of fun to look at as evinced by 90s music videos and winamp. However, most people that I know prefer waveform to spectrogram editing. At the end of the day the artefact being produced is a time-domain, and spectra are literally convoluted with respect to time, leaving the user to be uncertain about if any spectral edit would add an unintentional click or blurriness. Another way to explain this is because spectra are computed over time windows, and if we are plotting spectrograms in 3d, with one of the axes being time, there is ambiguity as to what editing a certain point should mean. Another issue is that there are numerically impossible power spectrograms because of the overlap in time, but there are no impossible waveforms, since the waveform is the ground truth.
  • Providing a waveform interface is still important. Being able to zoom and accurately select, move, cut, and apply effects to a waveform is the core of an audio editor. The waveform provides quite a bit of information: it’s easy to tell if there is a tonal or noisy component present when zoomed in, and the RMS information at zoomed-out scales gives a reasonable amount of info about where the events lie in time. 3D elements might be used to add more space for large numbers of tracks, or possibly positioning the stereo or ambisonic location.
  • It’s now obligatory to mention why AI won’t make any new tool obsolete before getting started. So why not make a 0-D audio editor that is just a text box telling the AI what to edit? If it worked well enough, that would capture some percentage of the market (e.g. removing the pauses from a recording is already popular use case). Generative audio will become more useful for audio creators too. But it’s still a while before we capture what human audio editors do. There is a lot of audio data, but little collected data about the creative process of editing audio. Editing is also a highly interactive process with necessary trial and error, where the trials and errors are useful for aesthetic and deeper understanding of the objects behind the audio that reveal the next step to the editor. I think as long as humans want to be creative, we will need audio editors.
  • Audio editing capability has been stagnant until recently. Although I worked on Audacity, I was always rooting for something better to come around. In fact, one of the reasons I worked on it was because it had obvious issues that could be resolved (multithreading, hardware interface UI). Sound Forge was my favorite audio editor in the early 2000s. When I talked to sound engineers, they mostly wanted something that was fast, accurate, and reliable, with some preferring support of certain plugins. They don’t need a DAW for everything, but everything basically turned into a DAW. The basic linear interface wasn’t really improved on, just more support for tracks or inputs was added. This could mean that innovation in interface is highly constrained because what we have today gets us there eventually without having to relearn or encounter problems with a new interface. Because of this, I would consider VR editors better suited as a hobbyist or research project than a business venture.

Here’s what I found:

  • Immersive VisualAudioDesign: Spectral Editing in VR (Proceedings of the Audio Mostly 2018 on Sound in Immersion and Emotion): A university-driven research project that made a nice analogy of morning (low azimuth) shadows from mountains to spectral masking. Also cool that paper prototyping for academics is a thing. I remember it catching on for game dev and mobile app design in the late ’00s and thought it would make more sense in other areas as well. It works for spectrograms because they are a linear physical to physical mapping. This project seems like it wasn’t developed further though.

  • There are a few synth/patch programming VR projects (one called SoundStage from at least 2016), but I don’t consider these audio editors. Music production/education is an interesting area to look into as well, and probably a lot of fun.
  • Almost all VR audio editor searches return results on how to create audio for VR, which is what I expected. There might not be a lot of demand. OTOH, I feel like people on the meta quest really like novel utilities.
  • The Sound Particles interface is clearly designed for a 2d/DAW scene-entity paradigm, which I said wasn’t the focus of the post, but it’s actually the closest example I could find to something that you could drop into VR, since it renders the audio scene visually in 3D.

So I didn’t do a lot of research, but I promised myself to do enough to make a hot take post in a day. I feel like there isn’t much, probably due to lack of demand and a history of audio editing progress being relatively incremental, but that also means it’s possible to do something fun in the space, even if being useful still seems like it would take some time to dial in. So there you go. Please let me know if you know of any interesting editors, applications, or research in the area.

31 Dec 23
05:48

Code mysticism and Halt and Catch Fire post LLMs

I’ve been re-watching AMC’s epic series Halt and Catch Fire. As a historical drama covering computer and internet developments and an seizing opportunity with ambition in the 70s to the 90s, it was nostalgic and motivating to watch when it came out in 2014. It never really got that popular. It was sort of scoped like Mad Men, covering several years per season, with a much smaller budget, and there are some holes in the writing, including a few two-dimensional characters. Still, it fills a unique niche and has a solid fan base. It’s one of a few series I re-watch occasionally.

HACF breaks the main roles up into hardware/systems engineers, software ‘creative’ engineers, investors, and vision/product people that work together or against each other as leaders or founders of a company. When I watch shows with plausible tech or science experts, it’s fun to see how expertise is communicated to general audience. I’m not alone – there are many that track the technobable in Star Trek (e.g. inverse tachyon beam) or noted how Emmet Brown endearingly lost his science credit when he called it a ‘jigawatt’. There is also the non-sarcastic appreciation of fictitious displays of tech expertise and the realism of the futuristic interfaces in movies like Hackers. One thing I liked about HACF is despite using a decent amount of technobable, it plausibly captures the approach and spirit of hacking and coding, like reverse engineering a memoryless chip by rigging up a hex LED system to read out the values for each of the 65536 inputs to a ROM.

The expertise of the coders are demonstrated mainly by others admiring the structural complexity of their code as objects of beauty. This is something that feels extra nostalgic now. Some examples from real life: Donald Knuth wrote the Art of Computer Programming. The book I learned 3D programming from was called The Black Art of Macintosh Game Programming (there were many like this in the 90’s/early 00’s, and this was the mac version of the popular Andre LaMothe books). In 2008 when I joined Google Summer Of Code, they gave everyone a free O’Reilly book called Beautiful Code, which covered a lot of real world algorithms and problems that had ‘beauty’ in their code solutions.

I think this code mysticism was already a fading trend, but LLMs with their code generation have made coding seem a bit less magical because now I didn’t write it, or at least, I didn’t have to write it. Maybe getting rid of the woo is a good thing; maybe there’s a lot of ego behind appreciating code this way. But re-watching HACF made me think about some of the nice parts of banging your head against a wall and waking up with the solution that is just right for your constraints.

It’s also possible I’m not representative of others here. I should definitely ask some younger startup employees and grad students how they feel about coding. From my observation, people still appreciate good, elegant, clean, efficient code, and attribute expertise to the people that can produce this regularly and well. But it feels like the legendary 10x hackerman is slowly being made more approachable with ‘average’ coders being able to write better and faster by employing an LLM to fill in their gaps.

But the point I’m making is not about the hackerman rockstar, but about the worship of the code. It seems like a special kind of appreciation that is reserved for the arts. In most other science domains, expertise and feelings of admiration were generally attributed to the innovator, not the invention. Code takes on a life of it’s own. But now that something else is getting close to be able to create nice code, maybe that kills a part of the appreciation of the art. Here is a hot take that is certainly problematic: Once something is able to be mass produced by sufficiently skilled artisans, it is not artistically interesting to make more of that thing. The thing becomes a craft – it can still be difficult for you or me to produce from scratch, but with the right tools and knowledge, a large enough percentage of the population can do it.

LLMs are not quite there yet for coding tasks that are beyond leetcode. If you ask it to do complex in-domain things related to speech/music DSP (e.g. please synthesize a flute with ASDR envelope) or ML, it gets the outline, but fails on the details. To me, the leetcode problems are more difficult, but the LLM has seen enough of them to be able to have inductive bias. It’s also only able to do the local code suggestions at the single functional or class level – it starts to fail once you need a nice structure (e.g. using polymorphism gracefully in game entity objects/maps). The latter point – structural complexity of class/API/function is still out of the LLMs grasp. It’s closer to the aesthetic of (physical) architecture that combines form and function with consistency. Maybe it’s difficult due to context window restrictions, but I won’t be surprised if it requires something extra to be able to learn this particular aesthetic.

There is also a kind of beauty in low level, highly constrained programming and system design, such as embedded systems and low resource or highly reliable systems. This is an information theoretic-type of beauty that can be seen through an objective lens, with things like Kolmogorov complexity and reducing a solution to the smallest amount of object code. This is the type that is demonstrated a lot in the first season of HACF. I don’t think the LLMs are great at this either, since correctness and meeting precise resource requirements are one thing it struggles with. Since this can be made objective, it’s probably easier to approach for the LLMs, though I’m not sure how many people are focusing on this right now. For the past few years it’s been the case that more data trumps all, but I could see these kinds of things (and other things that require correctness and precision like math proofs) require some special focus.

Perhaps part of the fading mysticism that I perceive is that these two areas of code appreciation are at the opposite extremes and more of the actual problems and engineering are in the middle, at least for my career path. Less people need to roll out their entire stack from scratch on their own, and this is a good thing. The beauty still exists and is even more complex and interesting if you zoom out of an individual’s work to the team or organization. But the object is now missing that 70’s-90’s American individualism aspect that has been culturally ingrained in my generation. Again, maybe that is a good change. Maybe the mystery and appreciation of a relatively new frontier was part of a generation’s collective motivation for diving so deep into the matrix. I don’t know. I’m curious if this is just my own feeling or others notice this too. It’s entirely possible that this is just a way of coping with ‘losing to the machines’, or at least, losing ground to the machines. To be clear, I am fascinated by the research and application progress of ML and will continue to be. But I think it’s fine to be nostalgic about things too.