Riffusion takes a clever approach to generative music. Using a finetuned version of Stable Diffusion, trained specifically on spectograms of music, they made it possible to generate music using just a prompt.
Getting started with Riffusion is very easy, you can:
I ran Riffusion locally on my M1 Ultra Mac Studio, and song generation took about 10 seconds per 5 second clip (2.48it/s avg.)
Using️ Text to Audio you can start riffing away. Take a look at this guitar example:
prompt: a classic guitar song, fingerpicked, no chords negative prompt: chords
Taking it a step further
Like img2img transformations with Stable Diffusion, we can do audio2audio transformations with Riffusion.
In the example below, I created a small tune on my OP-1 synthesizer, and accompanied it with the following prompt:
Prompt: a jazz song, light drum in background, saxophone Negative prompt: electronic music
In short, Riffusion took 'thinking out of the box' to the next level, by using spectrogram images as a base for generating audio. The resulting audio clips are a small glimpse into the generative music future!