MAGNeT: GEN AI-Powered Music Generation from META

Discover MAGNeT, Meta AI's text-to-music model that generates original audio clips from natural language prompts.

Music has long been an integral part of human culture, evoking emotions and bringing people together like no other art form can. From classical compositions to modern hits, music has the power to transport us to different eras, evoke memories, and even shape our moods. What if you could create your own original songs or melodies simply by typing out a few words? Maybe you need an original piece of music for your next Instagram reel or a unique sound to introduce your new podcast or YouTube channel? With MAGNeT, Meta AI's text-to-music model, you can create original music clips with just a text prompt. By harnessing the power of generative AI, MAGNeT allows you to generate high-quality music from natural language input. In this post, we'll show you how to get MAGNet up and running on your local PC in just a few clicks.

What is MAGNet?

MAGNeT, short for Masked Audio Generation using Non-Autoregressive Transformers, is a text-to-music generative model developed by researchers from META AI. MAGNeT works by using a transformer-based architecture to generate audio samples conditioned on text descriptions. MAGNeT was trained using 16K hours of licensed music from a META internal dataset of 10K high-quality music tracks, and on ShutterStock and Pond5 music data. It is part of META’s Audiocraft code base for generative audio, which provides models for music, sound effects, and compression.

MAGNeT consists of an EnCodec model for audio tokenization, and a non-autoregressive model based on the transformer architecture for music modeling. The model comes in different sizes: 300M and 1.5B; and two variants: a model trained for text-to-music generation, and a model trained for text-to-sound generation. To run MAGNeT locally on your PC you must have a GPU with at least 16GB of memory, especially for the medium size models.

Want to hear what MAGNeT can do? Check out this page from Meta’s researchers, which explains the model design and provides a variety of MAGNeT generated audio samples.

Try the MAGNeT Demo on your Local PC

The easiest way to get MAGNeT running on your local PC is to use Pinokio. If you haven’t read my recent post on Pinokio, go check it out. Pinokio is a Gen AI app browser that will take care of all the nerdy details of installing open-source Gen AI apps like the MAGNeT demo app on your Windows PC. Once you have Pinokio installed, you can install MAGNeT in just a few clicks and start creating music!

MAGNeT in the Pinokio "Explorer" view

Step 1. – After installing Pinokio, launch the app and then Select the “Explore” Tab. Search for “Magnet”. Click the “Magnet” app tile.





Download MAGNeT in Pinokio

Step 2. - Download and Install the MAGNeT App - Click the download button to install the MAGNet Demo App. If you have system dependencies, Pinokio will alert you and ask you to download and install those first. Click through the prompts until MAGNeT completes installation.

Step 3. – Start Generating Music - Once Pinokio completes the installation, it will automatically open the MAGNeT demo app in a new window. From here you can start prompting for music samples. Note that Pinokio will download the relevant model the first time you prompt, so it may take a few minutes to respond. Once your first sample generates, the model is downloaded, and you should see faster response times.

MAGNeT Demo App in the Pinokio Browser

For my first prompt, I asked MAGNeT to create a 30 second audio clip for “a quiet, multi-instrument, acoustic song, for a spa environment”. The Small model produced a fairly repetitive new-age sounding clip which wasn’t particularly appealing. When I chose the Medium model however the same prompt produced a clip with more musical variation and nuance.

The MAGNeT docs recommend using prompts that provide details on the instruments present, along with some intended use case (e.g. adding "for a spa environment"). My prompt was light on details, so I will continue to experiment more with specific instruments and other details in my prompts.

I used the default settings in the UI for generating music samples. If you want more information on tuning the output parameters like Top-K, Top-P-Temperature, and CFG, there is a useful tutorial here which explains each in detail.

Conclusion

With all the hype around text and image models, it’s easy to forget that generative AI research and innovation is ongoing in other mediums like audio. By leveraging the power of transformer-based models and advanced conditioning techniques, MAGNeT has demonstrated the ability to generate high-quality, coherent, and diverse musical compositions, pointing to a future state for the technology that may one day rival human musical creativity. Text-to-Music models like MAGNeT have the potential to revolutionize the way we produce and consume music, and it will be fascinating to see their applications across different domains.

Previous
Previous

Imagen3: Google’s latest Text-to-Image Generation Model

Next
Next

Top 5 Gen AI Takeaways from the Databricks State of Data + AI Report