(Michael Chinen)

AI coding progress

mchinen

The AI-assisted coding environment is moving so fast, it all feels like a blur. While there is most definitely hype, there is also no lack of critics and skepticism. The top-down push of AI has made it sort of something that not all employees and coders want to use by definition. Yet it is clear, some people use AI to code of their own volition. I’m not sure where I sit yet, except that I’m perpetually and simultaneously impressed and disappointed.

I want to share my AI coding experience, at least to give myself a point of reference, and to avoid moving the goalposts. There are almost as many definitions of vibe coding as their are definitions of longevity. The initial one I heard was that you can use AI to code without ever looking at the code, so it could be used by complete novices. This is good for small personalized scripts. There are some tools I vibe code, but for the purposes of this post I’m interested in harder coding tasks. I’ll exclude that vibe-coding segment because I look at code when things are wrong, or to evaluate how well the AI did on a task, and how well this code will be maintained.

One of the first tasks I asked ChatGPT in nov 2022 was something like this:

Please give me python program that will generate floating point values for a 16-khz waveform that contains a typical male saying the world ‘hello’. Only use python math libraries like numpy, don’t call any speech synthesis libraries. You can use DSP techniques like frequency modulation (FM) and look up or ask for formant information.

This kind of task is something that would be very hard for a human, even if they know audio and signal processing, because of the way we don’t tend to quanitfy how phonemes change over time. Anyway, that kind of problem was and still is too hard for the best LLM models, and there is very little in the way of programs on github that do something similar, and pre-neural TTS tends to be quite complicated. But this will be one of the tests I will expect AI to be able to handle eventually. Bullish people in 2022 expected this kind of task to be done by 2024.

Another kind of task that is still very hard is to have the AI code up an image using vectors. When chatGPT 4 came out OpenAI made a deal of showing how the vector graphics program of a unicorn had improved so much over the past one.

A cool person on reddit made an updated version that tests more recent models. You could squint and say the images are getting better in the two years that elapsed, but it really just seems like sampling variance. There’s no obviously great unicorn still. So drawing a unicorn is still a decent test. And in the future, drawing the unicorn and making a scene interact with it. This is basically a video game or simulation, which is where the fun is at. That’s still not quite there, but eventually it will be.

The question is how long it takes. I’ve been interested in AI safety and the fast takeoff scenario since reading Superintelligence and have been at least tangentially involved with the rationalists/AI safety community. I disliked the cavalier way that some of the thought leaders on the tech side dismissed the earlier safety concerns, when there was more uncertainty. But today, I’ve updated on the slower than expected progress we’ve seen, and I’m increasingly skeptical of the fast takeoff or sudden domination models (an example was presented in ai-2027.com).

I’ve personally been using AI to code various things. The most successful have been small scripts, like the last post I did to measure my web usage. longevitypapers.com was a cursor/windscribe coded site, but it’s not that complicated. The things it struggled with were writing consistent apps above a certain size. I had it try to setup frameworks for speech vocoder training, and it gets the basic network structure correct, but [claude/deepseek from 6 months ago] fails once I start requesting interesting modifications, like to freeze a few dimensions, or to add noise at different amounts to different dimensions, or to augment the data pipeline, and I need to go in there myself.

The other thing we can measure is how many more solopreneur businesses there are that are using AI to run it. There was a lot of hype about imminent solo unicorns due to AI early on, but the AI unicorns were the Cursors and Windscribes that benefitted from making the shovels rather than the ones that actually bootstrapped off the tools. I’d argue that using AI to 10x certain type of workloads makes sense but for the vast majority it won’t. This is analogous to how AlphaFold 2 and 3 didn’t speed up drug discovery noticeably. But I will be watching for this. Using AI to run the business seems interesting to but Anthropic’s recent Project Vend sheds some light on the problems with this when they had an AI agent run a small shop in their office. Eventually, AI may be ready to fully run companies, do the software, operations, and fundraising, but it’s looking more like 30 years than 3 years. And many 30 year estimations are meaningless. But if we see serious change in businesses being run by AI by 2028 I’ll change my mind.

Today and yesterday I used the CLI based claude code because I heard stories of people being able to run it for hours and getting useful results. The strengths seem to be the boilerplate code and testing, as well as translation or porting. Because of this I thought to port over some old apps. I’ll share the experience in another post next week. But a quick teaser: as usual, it’s impressive and disappointing and fun. I wasn’t able to use it for long unattended stretches and needed to correct many mistakes. I’m more inclined now to maximize the fun route and use it to animate svg unicorns than to write useful apps. In case I don’t write soon, here’s the youtube video I recorded of me using it.

Comments

Leave a Reply to Crypto Deposit: 0.42 BTC detected. Claim now > https://graph.org/WITHDRAW-BITCOIN-07-23?hs=468cb67909c4ce01d4de8f316ca1874b& Cancel reply

Your email address will not be published. Required fields are marked *

Back to top