AI Music

Suno

Suno is impressing me.

Not in the “Wow, you could really go far, kid” way, it is generating some genuinely good shit. The workflow is similar to MuseTree, you provide a prompt, lyrics (or mark it instrumental – more on this) though requiring credits to work through a song makes it less amenable to experimentation. I would love for them to add shorter generation lengths for fewer credits – I’m fine with working through a song 10 seconds at a time if it means I get to build it all. Compositing the clips appears to take it a significant amount of time compared to generations, which makes me wonder about their backend.

Built by Anthropic, and probably a profit engine for them, it has overcome the primary barrier making eg. Jukebox useless for musicians, where generation appears to occur real-time and streamable (usually.) I don’t know if it’s better or worse for musicians that you cannot get stems of different parts – the generations are presented wholly formed and uneditable. This probably makes musicians relying on Splice for revenue less nervous, but their discord has this as a commonly requested feature they are investigating.

The Good

Suno has given me a window into lyric writing, something I have felt very closed off from being a friendless loserbedroom producer with no money. I have never been invested in learning how songwriting works because I did not have a pet vocalist to bounce ideas off of, nor a recording environment that didn’t sound like Harleys and drunk idiots screaming at each other in the park. The rapid prototyping of electronic music is what appeals to me there, I only had to invest time to improve.

Suno will fuck up if your lyrics are bad.

This is in the “good” section because I’m happy to learn. Stumbling over “Insulating barrier, guarding our fragile souls” before making the drop “Butter” was hilarious to me since it was otherwise pretty good. It also completely ignored two lines I provided, it does that a lot and skips to different sections at will.

The parts where it becomes unintelligible were generated as “instrumental” but it will suck in the vocal quality and start speaking Simlish or sometimes clip whole lyrics as musical accents, especially in genres that do that a lot.

 

It has lyrics generation, the lyrics it generates are serviceable but cliche. It works well for small cute songs about doggos

I had it write about my old dog Riku and I camping. Me and Squeak are going on a trip for the first time next week. Yes I teared up. AI has officially made a grown man cry.

 

It will happily ‘hallucinate’ (yes I know) lyrics for you if what you’ve written is totally broken. I present to you, Email in All Fields, from the christian rock duo Suno and Float Overblow:

Lyrics: “Email”, Style of music: “Email”. I think it picked up on email sounding kind of like ‘amen’ and went from there. Email so full 😔 forever in my face

 

You can sometimes crush together genres, like “tropical cyberpunk” worked great:

 

While ‘Sillygaze,’ which is of course shoegaze but with clown noises, is apparently an opener song for a shonen anime:

 

I liked slowly morphing longer songs into entirely different genres.

Each generation you can change the song style and lyrics completely, which creates some fascinating transitions. This started as random unicode, and I pushed it to pop country with a (literally) random list of words, and finally Suno had a go at writing funk. That’s not what pornogrind is, Suno. But it’s ok, you’re only what, a month old?

 

It’s also not limited to English.

(this is a song about fish, it sounds like a song about a girl, I can only imagine the connotations Russian listeners get to enjoy here)

 

https://suno.com/@floatoverblow

For the rest, rather than hammer on my host’s bandwidth further, I’ll just link to my profile there. I sprung for a subscription so there will be more. Overall I’m incredibly impressed, and dare I say hopeful for how this could boost involvement from people that have given up on music. Suno’s stance here is my own, you can’t limit these tools to what makes musicians money if you expect music to grow to its full potential.

The bad

Take a peek at Suno’s discord and you’ll realize the people piling on this are definitely looking to monetize it. I won’t say it’ll be good for professional musicians. They’re definitely screwed.

Rather, I predict that the endpoint to all this is both a total collapse of the current commercial music monopsonies, and a solid cultural understanding that art and music is about communication. Without the person at the other end of it, we are the ones hallucinating. As listeners we should want a person making our music because it’s all about people and always has been.

We are going to start to feel indifference and detachment to effortless things, we are going to be bombarded with countless ephemeral masterpieces in the coming years and it’s going to get boring. The primary usage for generative AI will be memes, people using it for serious works will be viciously mocked, your email to Susan will always have some tell from Copilot’s current stylistic choice (because that is a thing humans have and Copilot is worse,) you’ll lose respect++, you’ll have to do the apology dance and the public hazing that anyone caught using microsoft to do their writing for them does, and my hope is that out of that we will build fresh support for a music culture without a gate to keep.

The ugly

There is no ugly. Listen to this absolute goddamn 55 second banger I only gave ‘meat’ as the lyrics for

 

(For reference, this was my impression of MuseTree, back before the drones)

MuseTree

Wandering through an AI’s Brainspace

MuseNet, a product of OpenAI, is a massive neural network trained on MIDI data from a large number of composers. MuseTree and Musenet MIDI Tool are useful interfaces to compose with it. The former has a nicer interface but the samples leave something to be desired – most of what’s here was made with the second.

Much has been made of it replacing musicians, composers, etc. but the process of composition with this is not typically ‘push button, receive quality.’ It’s more of a thick swamp of barely compatible musical ideas that can be pushed, with great effort, into something regular musicians still scoff at and tell you is soulless and not music after all. So, of course, being the contrary bastard I am, I’ve used it a bunch anyway.

First is one using the model of a single composer, Erik Satie.

Each iteration allows you to choose the length of the next section generated. Sometimes, to avoid bad cliches and move the piece in a new direction, you have to limit this as the tree progresses, or you’re stuck regenerating the segment much longer than you’d like. Temperature settings allow for more typical or atypical writing – the lower the temperature, the more ‘stuck’ in the model it becomes, to the point where it can become obsessed with a single chord progression as it slowly descends into madness. Temperatures higher than 1 seem to draw from ideas the selected composer might use rarely or avoid for being not very good, but if the song as a whole becomes stuck it’s very useful to up it shortly to kick it out of that spot.

I found it really easy to retain Satie’s sense of humor. It quotes Beethoven’s 5th rather excessively and does not give up on making that a whole thing. At the end of the song I just want to pet its stupid fuzzy head. It ends on a single note that it really thinks is the best possible note to end on, and to hell with your opinion, this is a MASTERPIECE.

Next shows what can be done by alternating models, in this case Chopin and Rachmaninov

You can swap ‘who’ is doing your composing any time you want. When you do this, it’s very difficult to know if the direction you’re going has a reasonable ending even available. These two seemed unusually compatible even with a rigid 1/1 ratio.

Try to get too cheeky, and bring Bjork to meet Beethoven, and things can get a little uh… well.

One of the most challenging things about composing with this is that since you’re only ever hearing continuations, there’s no indication about where it will end up. That makes it very easy to lose good song structure. Contour is totally absent until it occurs – you’ll want it to follow what’s clearly prime opportunity for build-up, it will decide to bridge to an entirely new idea, or vice-versa.

Compromise is inevitable – breaking measures to follow brilliance can sometimes be for the best. If it generates something genuinely good but only on a microscopic level, tearing yourself away from that because you know it will preserve the whole song from disaster is frustrating. Listening to the whole thing over takes up much of the time, it’s the only way to re-anchor yourself in how things are actually going in the piece and get an idea of where it could/should go the next cycle.

I’m sure this process could be improved with some application of theory – the AI needs more guard rails to follow for fast production, like I’ve seen times where Musenet will abruptly decide to end a song 20 seconds in.

Here’s an example using three models, Tchaikovsky, Debussy, and Satie (picked because each apparently influenced each other,) switched without any predetermined pattern and whenever I felt it would work best.

I have honestly learned more about some composers than I ever would have casually enjoying classical music this way. It’s not at all a replacement for learning to write music. It does let you explore a massive set of brand new idea spaces. It’s funny, entertaining, but hard work to create anything of quality.

People anxious about this somehow ruining music should really go ahead and try to use it. They should remember that their music comes from musicians, no matter how convoluted that process gets. Pop music is already written by committee. If you still find enjoyment from it, it’s only because those still have people in them making choices.