Riffusion is a music generator that creates music from text prompts, allowing users to experiment with different styles, instruments, genres, and sounds to explore the latent space of sound. The platform uses a base for img2img diffusion to keep riffs on beat and impact melodic patterns. Users can adjust the settings to explore and create different musical outcomes, with higher denoising resulting in more creative output, but also potentially causing the music to be off-beat.


  • Built on Stable Diffusion, the open-source AI model that generates images from text
  • Fine-tuned on images of spectrograms paired with text to generate audio clips
  • Short-time Fourier transform (STFT) used to compute the spectrogram from audio
  • Griffin-Lim algorithm used to approximate the phase when reconstructing the audio clip
  • Torchaudio used for efficient audio processing on the GPU


