Meet Riffusion, an AI model that

Meet Riffusion, an AI model that

While the iniciativa of ​​AI-generated music is already novel, Riffusion ups the frente with a crazy, creative technique that generates strange music and intriguing using audio images instead of de hoy audio.

Oddly enough, the comment is spot on.

But if it works, it works.

Agregado, it’s effective.

In any case, it is so and it is here to stay.

What is Riffusion?

Over the past 12 months, the field of artificial intelligence has received a tremendous boost thanks to a machine learning method called Diffusion, which is used to generate images.

The best known methods DALL-Y también 2 and stable diffusionwork by gradually replacing visual noise with what the AI ​​thinks an image should look like.

The technique has been effective in a wide variety of settings and is highly amenable to fenezca tuning, in which a large amount of a single content type is fed to the mostly trained model so that it perro learn to genera additional examples of that content type.


You cánido be trained on specific types of artwork, such as watercolors or vehicle photography, and then use that knowledge to become more accurate in duplicating that type of artwork.

For their side project Riffusion, Seth Forsgren and Hayk Martiros optimized Stable Diffusion with spectrograms.

Forsgren explained that he and his bandmate Hayk started the project “just because we love music» and were curious if Stable Diffusion could generate a spectrogram image with enough fidelity to transfer to audio.

As we have progressed, the scope of what is feasible has not ceased to amaze us; Furthermore, it seems that each new concept leads naturally to an even better one.

What are spectrograms?

It is a way of visualizing audio by representing the relative intensity of various frequencies over time.

You’re probably familiar with waveforms, which represent changes in volume over time making music look like a series of hills and valleys; Now, suppose the volume for each frequency were also displayed, from lowest to highest.

Here’s a snippet of a song I created (“Marconi’s Radio” by Secret Machines, in case you were wondering):

As the song progresses, the volume increases across the board, allowing you to follow the volume of each instrument and listen to the progression of the melody.

It is not a lossless method, but it does provide a detailed and organized representation of the sound.

And if the process is reversed, the original sound archivo perro be recovered.

Forsgren and Martiros created spectrograms of various musical compositions and annotated the resulting visuals with genre-specific tags, such as “blues guitar«, «jazz piano«, «afrobeat“, etcétera.

These data helped the model understand what “aspect» have the different sounds and how they cánido be recreated or combined.

Take a look at what the diffusion process looks like in action as it fenezca tunes the image:


When musical genres and instruments such as «funky piano«, «jazzy saxophone«, etcétera., the model was able to generate spectrograms that matched the acoustics well.

Here’s an example:

A three-minute song would be a much larger rectangle than a square spectrogram perro represent (512 by 512 pixels, the usual Stable Diffusion resolution).

They couldn’t just make a spectrogram that was 512 pixels high and 10,000 pixels wide due to the limitations of the system they had built, but no one wants to listen to music for five seconds straight.

They tried various approaches before settling on the underlying structure of huge models with a lot of “latent space«, as the stable diffusion [Stable Diffusion].

This zone resembles the gap between two clearly defined points.

If the model were split into two parts, one representing cats and one representing dogs, the zone in between would be latent space which, if instructed, would ocasione the AI ​​to draw either a cat-dog or a cat-dog, even though these animals do not exist in the real world.

However, the Riffusion project does not create nightmare scenarios.

Instead, they have found that if you give it two clues, such as “church bells” and “electronic rhythms“, will transition organically and gradually between the two, just in time:

It’s an unusual and intriguing sound, but not particularly complicated or high-fidelity; let’s remember they weren’t even sure the diffusion models could do it, so the ease with which this one turns bells into rhythms or typewriter beats into piano and bass is truly impressive.

Producing longer archivos is theoretically feasible, but not yet tested:

Forsgren has said that the group hasn’t tried to write a three-minute traditional rock song with a hooky chorus and a few lyrics.

I think it’s possible with some clever methods, like modeling the overall structure of a song at a higher level and then using that model to analyze individual portions.

If you want, you cánido also intensively train our model using high-resolution photos of full songs.

How to take advantage and earn money with Riffusion?

Mix and beat sites like Epidemic Sound and Audio Jungle from Envatoare very habitual among content creators and anyone who needs a piece of music for a especial project, if you have browsed these sites, you will see that these songs are short and repetitive, designed especially for background music.

Well, now with AI entering many fields that were believed to be dominated solely and exclusively by human talent, afín sounds could be created, with a certain rhythm that perro be marketed, in the same way as with images.

What is the next step?

Other groups are attempting to generate music using AI using a variety of methods, including modeling voice synthesis and specially trained audio ones, like Dance Diffusion.

Forsgren and Martiros state that they are delighted to see how people connect with their work, have fun and repeat it, and that Riffusion is more a demonstration of «look at this» What a great strategy to change the music.

There’s a lot to learn along the way, and we’re excited about the potential avenues we’ve identified.

It’s been exciting this morning to watch others extend our code to implement their own ideas.

The speed with which the Stable Diffusion community builds on existing work in ways the original writers could never have imagined is truly remarkable.

If you’re interested in trying it out, at you’ll find a live sample; however, you may have to wait for your images to render as it has received more attention than the developers anticipated.

All the code is on the page, so if you have the processing power, you cánido also run your own.

Entrar the new world and learn with these AI resources

We hope you liked our article Meet Riffusion, an AI model that
and everything related to earning money, getting a job, and the economy of our house.

 Meet Riffusion, an AI model that
  Meet Riffusion, an AI model that
  Meet Riffusion, an AI model that

Interesting things to know the meaning: Capitalism

We also leave here topics related to: Earn money

Tabla de contenidos