Saturday, December 13, 2008
Just in time for the 2008. We’ve been working more diligently than an imaginary development log reader might think. And not just on such great koans as “what thought does an imaginary development blog reader upon reading an imaginary blog with an imaginary subject,” but also on our genetic algorithm. We’ve fixed a lot of bugs in the GUI, but besides that, completely changed the spectral estimate function by doing both real hard empirical tests to create an integral that was both simple enough to run in good time and mapped well enough on to the actual synthesized spectrum it makes.
The other real big change was the implementation of a cache. It is a two dimensional backwards and forwards extra-fancy grade cache that uses extra-fancy hit ratio tracking. The basic idea is that our variable length chromosomes grow bigger and bigger, and so that a single mutation is likely only to change a small part of the sound, in either frequency space, or time space. So what we do is cache every frame’s estimate, and use it if the interpolated parameters match. BUT WAIT, THERE’S MORE! The spectral estimate for a given frame is the sum of the integrals for each noise band. So if a noiseband in the middle of the thing changes, we can’t use the cache’d frame. So what we do is cache the say, 20%,40%,60%, and 80% noiseband marks, and do the same for reverse. So if the 55%th noiseband changes, we go back and add the forward 40% cache and backwards 40%(which is really 60-100%) and then from there manually caculate the integral for the the 40-60% mark and add those. In otherwords, this doesn’t make any sense unless you, fortunately like we did, upgraded our imaginary blog readers to the telepathic-understanding-cache-imaginary-dev-log-readers type.
Also if you’ve been reading the forums, biological evolution seems like it is optimal at an average of 1 mutation per chromosome. It turns out that works well for genetic algorithms too, so we’ve implemented that.
We still have a ways to go, but we can do noisy sounds, such as wind now. Female speech is a long way yet to go, mostly because we need more dynamic window functions (hint hint).
Wind experiment directory of wav synthesis and original for wind/screen source.
Female Speech experiment Bad, but understandable female speech experiment. original and subsequent generations.
Saturday, October 20, 2007
It has been far too long. We presented this project at ICMC in the fashionably expensive city of Copenhagen. Here’s the paper.
As far as development goes, things have changed since then. But as always, since you, pretend reader, are such a defender and demander of quality, let us do naught but make haste to demandee you some:
Noise Band bass examples
Sinusoids that move in amplitude and frequency randomly generated envelopes, with a bandwidth envelope that goes from fully open to fully closed, gradually opening up over one minute. They’ve gotten much better and can trick some people sometimes into believing they are real sounds in the end.
Original bass input sound (bass) wave format, about 300k.
Synthesized result, 13th generation Early result
Synthesized result, 300th generation
Synthesized result, 700th generation
Synthesized result, 1700th generation
Synthesized result, 16000h generation Final result (sounds very similar after about 3000 and on)
Friday, April 13, 2007
Today would have not lived up to its reputation as a friday the thirteenth had we not just finished a new method of synthesis for our engine: sinusoidal based noisebands. And I’m talking full bandwidth noise bands that are not weak wussy filter generated noise bands with their unpredictable phase and amplitudes. I’m talking about grown men monotonic in phase sinusoids that have advantage of being obedient when you want them to be. Last week we did four hours of googling and found no obvious similar techniques, so this small synthesis component may actually be worthy of publishing. However, for your grain of salt, three and four fifths an hour of that time was spent self googling. Have a listen for your self to decide how sweet this is:
Noise Band examples
Sinusoids that move in amplitude and frequency randomly generated envelopes, with a bandwidth envelope that goes from fully open to fully closed, gradually opening up over one minute
80hz Noise Band
400hz Noise Band
1000hz Noise Band
10000hz Noise Band
Thursday, Febuary 1, 2007
There’s some sweet new open source projects on sourceforge, some of which we might have been partly responsible for. The benefits are of course, CVS and public incentive.
also, here’s some
older installation music that never got posted.
Friday, December 22, 2006
It has been a few months since our last post. We’ve been showing off music at places. Sometimes it was relevant to this research, like the new piece that uses the GA output to make all its noise. In all, we went to three cities in Japan and played music.
However, we are a little happy to get back to research. Lately in our heads we have been thinking of using some kind of subtractive model to make the fitness function really speedy. This still relies on using the FFT. Some of us are thinking to look more into mp3 and ogg encoding to figure out how they represent their spectral data. The sinusoids are getting a little harsh, and if you’ve heard that piece, you’ll know what we’re talking about. Anyway, hang in there, we’ll post more later.
Tuesday, September 5, 2006
In the middle of august we presented our current state of research to a accoustic research society. But as a reader it is not clear what the current state of our research is and apologize about the sentance order used here. Since the last update, a lot has been accomplished – we can now resynthesize certain sounds like cello within a few decibels of spectral distortion. However, we have become to think this sinusoidal based model kind of rather sounds like shit, and are looking for other ways to produce sound. However the results of the current model are below for your listening pleasure.
Last week we demonstrated our genetic algorithm sounds in a real time interactive system that uses four light sensors. We recorded lots of insects in shikoku, analyzed just two of them with our genetic algortihm, and used the output sound to do a 4 channel sound space. It was done in a week using supercollider, so the results are so so. Here is a picture of interested looking and uninterested-looking Japanese playing with it. Listen to the sound below.
Original Recorded sound (bass) wave format, about 300k.
Evolution progression of the resynthesis. wave format, about 2.3mb. A concatenation of the most fit sound in progressive generations.
Live Photosensor demo Excerpt:
Clickit excerpt mp3 format, about 3.3mb.
Wednesday, June 28, 2006
The New Deal
Everything has changed. I’m panicking. The structure of the chromosome is being ripped apart to make way for new interpolated sound generators that guarantee some smoothness. Listen:
Chromosomes with Interpolated Sin Sound Roots. Mp3 format, about 2 mb.
Ben is actually 50 chromosomes, each lasting about two seconds, overlapping by one second.
It sounds like fireworks. The little buggers are lively.
Wednesday, June 21, 2006
We had to present some kind of demo with our incomplete software for an open campus event to attract high school students. So this is what we spat at them besides our horrible Japanese.
Sound Morphing Trials. Mp3 format, about 2 mbs a pack.
First attempt from a bass note to a clarinet. But we messed up the heuristic function, so it goes wild
2nd attempt is about 200 generations played over a 60 second period, so they overlap, and you hear a sort of Phase vocoder-like effect
2nd attempt, discrete is they same as above, but with only 20 generations, so the sounds are discrete and you can hear the morphing.
These files show how morphing is possible with a genetic algorithm framework. To do it, you simply set the initial generation’s population to be not random, but a chromosome representation of the start sound.
Thursday, June 15th
Osashiburi da ne. (Lets pretend for once that it’s just one guy living in Japan that is writing these logs, (instead of thirteen aliens of different solar systems so enthused by the human race that they get their kicks living on mars remotely controlling the human bodyshell of a guy living in Japan.) Then, we are elligible (pretend-elligible says alien2,) to write in the first person (pretend-first person).)
I just moved across tokyo from Asakusa to Odaiba. This means not being able to afford a moving service. But instead I had the privilege of carrying 200 pounds of stuff in my arms divided amongst four trips on the world’s most crowded trains. It also means that the genetic algorithm had to take a break for a few days.
However, since a whole month has passed since my last post, it should be said that considerable progress has been made. The crossover algorithm got completely heirchical and crazy. Most notably though, is the speedup factor of 300x from our last post under certain conditions. This was accomplished with a dumb cache, a bit of sorting, and the use of heuristic estimation *before* synthesis, allowing us to get away from the bad one (Slays kings, ruins towns and beats high mountains and everything but Tolkien,) without even putting down 44100 samples per second per gene.
Also, for now I’ve really simplified the test problem. I have changed from trying to resynthesize with high granularity but low quality using the previous “Lincolnshire Posy” example, to aiming for a perfect or near perfect resynthesis of a simple clarinet sample. It is easier to go from high quality to low, for one. Also, this is a more scientific and engineering problem now, and does not make it any less artistic, I do not think. “haha, he said ‘I do not think'”. I do not think I do not think here are your moments of I do not think I do not think I do not think I think.
Clarinet resynthesis attempt, three second clips: wav format, about 300k. (only using sinusoidal cells.)
First Generation (is a random collection of sin waves)
11th Generation, fittest has found a low sinusoid near the fundamental
1000th Generation, fittest has found many of the harmonics, but lots are out of tune
10000th Generation, fittest has found all of the harmonics, and tuned them, but the phase and attacks are still missing.
In the end, you can hear a really rough sound of a clarinet, but the sinusoids in the resynthesis do not ‘bend’ like the real ones in a clarinet does, so the harmonics don’t sound fused. There is much to be done for sure, but that you can kind-of-sort-of hear a resemblance to a clarinet, and that this was originated by a random set of sinusoids gives us a lot of hope. Also, we need a good resynthesis in a month and a half, because we have an official presentation coming up in August, and will need a paper by then.
Tuesday, May 16
Several things have come to pass. We presented the algorithm yesterday. The presentation assumed no knowledge of genetic algorithms going in, so it took a little while longer to explain. Also, We are in Japan right now, which speaks mostly in a non-English language, and this also takes some time to translate. The presentation is in powerpoint format, so you can download it if you are curious. Note the moniker “GeneSynth” and not “Fucking Sound,” Just to prove we can come up with clever titles. Of course we all know that if you use the word “Fucking” in a academic presentation, researchers will look the word up, realize what sex is, and then immediately turn roundabout, ceasing upon all intellectual pursuits for the newly discovered carnal quest, or at least think we are trying to be revolutionary for the wrong reasons, and we definitely can’t have the former. Anyway, GeneSynth is what the project has always been called in XCode, and we plan to put it up soon on SourceForge as an open source project, once things are working better.
Much more work has been done to the algorithm, but it is mostly bug fixing. The Fitness Functions have been implemented, using FFTs and RMS values. A bad gaussian noise unit gennerator has been implemented, yielding bad, but existing results. The good thing is that now we have enough to start fitting sounds to other sounds. The bad thing is that the algorithm needs to be fixed up to be able to process longer sounds (See the example below). Some work has already been done on this, making all audio buffers used by the GA requiring the use of a custom memory manager, which shares and recyles memory efficiently, but more is needed.
Sound Fitting: mp3 format, 250k each, 20 secs
Target, pre-existing file we are trying to fit a bunch of cloth, guitar, piano, and cello samples against. Note the dynamic contour- the steady attacks in the first 10 seconds, followed by a pause and a very soft section which crecendos at the end.
Random Fitting, the first generation is created by making (30) random chromosomes and synthesizing each of them. This is the best one scored by our FFT fitness function. It is random in terms of the chromosome, but because we use pre-existing samples to form it, it has more interesting features than random samples (noise).
Fitting, 50 generations Later is what you get after 50 generations with our buggy and slow algorithm. Even so, you can vaugely hear and definitely see in an audio editor, the contour starts to match the shape of the target. At 10 seconds there is a clear drop off in volume that builds towards the end, just as the target has. Also, the frequency content is closer than the random sample – you hear the last chord in the first half being represented by higher pitched samples.
Monday, May 8th
Two major components have been implented – the mutation and crossover of chromosomes. There was a lot of bug hunting, and probably more of the critters will be revealed once logging is implemented, but right now it *sounds* like they are working. The mutation and crossover of the cromosomes we are using for sound synthesis is made more difficult than the easy peasy classic genetic algorithm because of our chromosome structure is complicated on several factors: it is of variable length, it is self referential, and it can recursively nest parts of itself. Now, some computer scientist might look at these words and scoff the back of their throat out into the open, because it seems like we may be taking a perfectly good simple algorithm and making it all complicated for no other purpose except to confuse and be able to use smart words as such, with our arbitrary ooh-thats-neat-self-referential-structure. And it would be fine because we really believe not enough people scoff energetically these days. But really we are doing it for a reason that we believe is correct. Of course we won’t know for a while. Faith is not just for Intelligent Designers, but also the ones that aren’t.
Mutation: mp3 format, 3Mb, four minutes of audio.
First Mutation Chain is a chromosome that is mutated fifty times and played each time, showing the gradual mutation process. You can hear the pauses about every four seconds which signal the next mutation.
Crossover: WAV format, about 350k and 3 seconds each clip.
since crossover is the namesake of this website this is a pretty big deal. The crossover examples below use chromosomes with very small chromosome sizes so that you can hear how the next of kin inherits traits more clearly. Jane and John are the comsumators and Jack and Jill are the little bastards as we’ve yet to implement a Marrige() function for chromosome.
Jane and John (John is softer and has less events than Jane)
Jack and Jill
The results are easy to hear. The next step is to implement an FFT and RMS fitness evaluation function along with the selection function and we will be able to hear some sounds that aren’t just random sin and random morrisey. Tune in next time to see what Jack and Jill did.
Monday, April 24th
The first few chromosomes has been synthesized. They are just noise and created by random initialization, but these will be the dollar bills in plastic sleeves that are hung up long enough to be dried out until stained purple blobs appear from the one time the new dishwasher forgot to close the window when it was raining oh so hard.
three second clips: wav format, about 300k. Both have under 100 cell definitions and 200 cells per second.
Jane Number 1 is created with genes of file sound sources
Jane Number 2 is created with genes of sinusoidal synths
The file sound source based ones are pleasant because there aren’t many files (only two) to choose from in my library yet so you get a lot of echo/reverb like effect.
I like the sin based ones. It’s more pleasant than I expected from random init, I guess because each cell definition can be used many times to make a drone/chorus effect.