（评论）

（评论）
(comments)

原始链接: https://news.ycombinator.com/item?id=39930463

用户分享了他们对 Suno.ai 的赞赏，Suno.ai 是一种能够创作音乐和诗歌的生成人工智能模型，强调其生成令人愉快和独特内容的能力。然而，他们质疑是否有一天它可能与人类的创造物没有区别。然后，用户引用了纪录片“Somm”，讨论了侍酒师在识别葡萄酒方面的专业知识，表明个人不太可能失去区分白葡萄酒和红葡萄酒的能力，同时保持对葡萄酒复杂元素进行分类的能力。尽管最初由于担心真实性和技能参与而对人工智能生成的音乐持怀疑态度，但用户承认他们参加了盲品品酒活动，他们发现正确确定葡萄酒的类型具有挑战性。他们认为，尽管人工智能技术取得了进步，但人类智能和人工智能能力之间仍然存在显着区别，并指出人工智能比视觉图像更容易掌握声音合成。在整个文本中，用户探索艺术性、可访问性以及技术在艺术表达中的作用等主题。他们讨论了人工智能使艺术民主化、让更多人参与和做出贡献的潜力。用户最后分享了他们对 Suno.ai 的兴奋，并将其视为生成 AI 模型和艺术领域的一项充满希望的发展。

Ha. Voice synthesizers and TTS systems (and NLP in general - dead electronics imitating this very intimately human thing, speech and language) always fascinated me, so far that this was a significant reason for me to study CS and computational linguistics.

This is literally some of the impossible sci-fi tech I dreamt of as an undergrad. Crazy. I'm still a bit in disbelief how fast things currently move on this front.

Interestingly, suno.ai is also able to imitate the very robotic and staccato-like intonation of Vocaloids: https://app.suno.ai/song/f43e9c46-92d3-4171-bdd9-026213d6772... - everything comes around. :)

It reminded me of Bad Apple - I'm not really familiar with all of this weird, nerdy Japanese culture, but I agree it feels very enjoyable to listen to what Suno created here.

Some strange and funny vocal aberrations here:

* sublicence - "sublissence"

* fitness - "fisted"

* infringement - "infring-ment"

* liable - "liar-ful"

It's also obviously not a pure human voice recording as the pitch transitions sound heavily auto-tuned or electrified (think Cher's "Believe").

I anticipate people becoming experts in detecting AI-generated vocalists in much the same way that we can currently detect AI-generated images due to abnormalities especially in details like ears or fingers.

And I also expect that, very soon, we won't be able to tell them apart anymore (like those wine experts that fail to detect the good wines if blindfolded).

I've heard this, and I would have been inclined to believe it. But then I watched the documentary Somm about the journey of a couple of friends reaching for the highest rankings of sommeliers. They could identify grapes, regions and year with striking accuracy. I just don't see how you could do that and then not be able to tell white and red wine apart.

I barely know what I’m doing with wine but am 100% sure I could at least tell you which are whites and which reds if you lined up a typical Chardonnay, a typical Pinot Grigio, a typical cab sauv, and a typical Pinot noir.

I am certain there exist weird wines that could fool me (I’ve had a few really weird wines) but typical shit from the grocery store, I’m gonna be able to tell at least that much. I might even ID them more precisely than red or white. It’s not exactly subtle…

Then again I don’t have a clue how someone could fail to tell which is coke and which Pepsi in the “Pepsi challenge”. They’re wildly different flavors. I can tell by smell alone.

I vaguely remember looking into this before, and it turned out that the tasters were being told (incorrectly) that it was a red wine, and asked to describe the flavour profile. They then used tasting terms more frequently associated with reds than with whites, and didn't question what they were told.

So it's less a case of "they cannot distinguish red from white" and more a case of "they went along with a suggested classification". I feel like this is a weaker result, although it's still a little surprising.

I'm from the UK and would probably have fallen for this for several minutes as well. I hope that I'd eventually realise from the number of states down the West coast.

My feeling is there is the high level classification which is quite difficult to fuck up. After that it’s all adjectives and analogues, which is the fluffed up phoniness that inherently presents itself in the process of converting our subjective experiences of physical reality into abstract symbols.

Yeah, but that still shows people's perception of wine is barely above noise level, if it can be so easily misled.

For comparison, imagine someone showing a piece of Picasso to art critics and saying "Could you please describe the artistic significance of this painting by da Vinci?" The critics won't start using terms commonly reserved for Renaissance era; they'll say "What the fuck are you talking about, this isn't da Vinci."

A lot of the biggest perceived differences come from temperature, since red wines are usually served at room-temperature. If you ever decide to do a blind test, make sure to control for temperature. I did it, and I had a very hard time picking out which varietals were red and which were white.

I rarely drink wine (less than 1x every few years) and I can tell the difference between a red wine and a white wine, and subcategories of red wines (and I do specifically mean the difference, so that means only when compared to another wine).

The hard part is identifying the type of wine, but many of my wine-drinking friends can do with ease. We've tried the "test," having me or someone else randomly purchase wines from the closest store and then serving random samples to them while they're blindfolded. They're able to identify the specific variety more than 4/5 of the time.

Yeah, I'm sure a lot of these tasters are overly pretentious. But some people are willing to go the opposite extreme and think people can't taste anything. Can anyone tell the difference between Coke and Sprite? Between Coke and Pepsi? Coke and Diet Coke? Of course we can. The difference between a typical pinot noir, syrah, or cabernet sauvignon is not something it takes magic powers to differentiate. Now specific years, wineries, etc, now that raises questions.

This myth is based on a fundamental misunderstanding of the experiment conducted. The conclusion of the experiment was that the vocabulary used to describe wine is subjective, and that the chosen descriptors are most heavily influenced by the color of wine, the perceived cost of the wine, and the taster's opinion of whether it was a good wine or not. I've participated in a blind wine tasting, and it was trivially easy for even complete amateurs to guess the right color of wine 100% of the time.

This one only shows you poor “expertise” more than anything, as it is standard exercise while training to become a wine expert in France (they also give students white wine that have been red-colored or otherwise tempered with), so I wouldn't expect any legit expert to be fooled this way. Though it's true that with some wines it can be tough initially for enlightened amateurs.

Source: my wife's godfather did the studies for that[1] two years ago.

[1]: https://www.isvv.u-bordeaux.fr/fr/diplome-universitaire-dapt...

As far as I can tell, AI image generation still struggles with some things after many years of research and is often detectable. Perhaps vocals is easier though.

It's like cgi, you only recognize bad examples of it while the good ones go right past you. I've got plenty of ai generations that fool professional photo retouchers - it just takes more time and some custom tooling.

> I've got plenty of ai generations that fool professional photo retouchers - it just takes more time and some custom tooling.

What’s a good place to find out the SOTA of the custom tooling and workflow?

Generative text-to-image models based on neural networks have been developing since around 2015. Dall-E was the first to gain widespread attention in 2021. Then later models like Stable Diffusion and Midjourney.

"Quadrupled" is a very specific and quantitative word. What measure are you basing that on?

The non-singing TTS are barely discernible now. I watch a lot of narration-heavy edu-tainment on YouTube and often the only way I can detect TTS is the consistent monotone and uniform syllable cadence. There can be 15 minutes before a single mispronounced word is spoken. That could be a preview of what's to come with AI video.

Many human vocalists have similar aberrations. Remember Jimi Hendrix "Excuse me while I kiss this guy", or the notorious autotune on a number of contemporary pop artists (you gave an example yourself)?

IMHO many of the successes of "artificial intelligence" come from "natural stupidity". Humans have many glitches in our perceptual mechanisms. The AIs that end up going viral and become commercially viable tend to exploit those perceptual glitches, simply because that's what makes them appeal to people.

The difference between this and Hendrix's "kiss this guy" is that you can listen to it and plausibly believe that Hendrix is actually saying "the sky". In the linked track you know the actual words but it still doesn't sound like them.

You can fix most misspoken words by tweaking the lyrics. e.g. in my most recent song it pronounced "pasting" from "copying and pasting" as "past-ing.

I just rewrote the lyrics as "paste-ing" and it sung it perfectly afterwards.

> It's also obviously not a pure human voice recording as the pitch transitions sound heavily auto-tuned or electrified ...

    The cake is a lie, but the music is real

    It’s all fake when the truth is revealed

> (think Cher's "Believe")

Or think GLaDOS. Pretty sure that's not a coincidence.

At that point, is it AI generated?? That feels like an entirely different category to me

(like it's sort of no difference than paying someone to voice something and share it)

I think the stuff that is completely generated with no human in the loop is a different category for me because it can be used for things at scale like, bots on social media, or ads in a podcast generated just for you, etc. As long as there is still a human in the loop making the editing decisions, it feels not categorically different from the world we have today.

That's a fair point, but "ai does the work and humans clean up the mistakes" is generally a lot faster than humans doing all the work. Singing well takes skill (even when you have autotune), splicing together multiple "takes" into one good recording, less so.

I would say it's still categorically different, just because we're automating one piece of labor that was kind of thought until about ~12 years ago to be un-automatible.

Like, there's been computer-singing voices for awhile, but they always sounded pretty robotic and goofy (e.g. Microsoft Sam), and I think for a long time people just assumed that to get mostly-realistic voices, you need an actual singer. Yes, it still requires a bit of human tweaking to make it perfect, but I suspect that if put to the test it would reduce the cost of making a song substantially.

I'm never going to have the voice to sing this, but I can easily imagine learning how to edit it.

AI/Human combos can still be valuable. More broadly I'd argue that that's how almost all tech works. E.g. there are still textile workers, just many less of them producing much more clothing.

> I anticipate people becoming experts in detecting AI-generated vocalists in much the same way that we can currently detect AI-generated images due to abnormalities especially in details like ears or fingers.

People fail to identify even the most basic and obvious fakes, but somehow there’s a group of people who think that as fakes become harder to distinguish from reality, we’ll all magically become experts at it. We won’t. People’s ability to detect fakes will get worse, not better, as a consequence of more prevalent and better fakes.

https://www.youtube.com/watch?v=1U1HMqtam90

I'm always surprised by this idea too. I can easily see an outcome of it becoming increasingly hard to convince people that real things are actually real.

I'm going to go out on a limb and say you don't listen to much "modern pop" - the production quality of the biggest mainstream pop is _extremely_ high at this point, and while "worse voice" is obviously subjective, this really wouldn't be anything stand-out in that regard even if it didn't sound like a robot.

I was very impressed by the mini bridge and then sudden addition of harmony between the license and the ALL CAPS statement part. Is that all AI deciding that? This made it a true song in my opinion.

My mind hasn’t been this blown by AI since GPT4. You owe it to yourself to check out Suno.ai. As a non-pro musician I’m excited by this. Some version of this could become a _starting point_ for me, rather than an unreachable end goal. I can see how pros would be horrified by this, too. For quite a few people some future version of this could be an adequate replacement for a music subscription, but of course not for a show.

With the pro subscription it usually takes less than thirty seconds for the songs to be playable. It keeps generating while you play though, so the whole audio file isn’t available for a few minutes.

Free accounts are queued so it depends on load and I don't think the v3 model is available to them.

I was thinking it would impact places like bars and streams and tv the most rather than actual consumers, or wherever licensing is concerned. I don't believe people would listen to AI generated music for the same reason AI isn't impacting fine art. People aren't going to hang AI paintings in their houses or listen to AI music.

> I don't believe people would listen to AI generated music

Counter-point, we've been listening to a rock song about the moon given the words from one of my kids books all morning.

People will 100% listen to (edit - I never finished the sentence) things that sound fun. It might not bring me to tears or stop me in my tracks but lots of things are just fun.

> People aren't going to hang AI paintings in their houses

People will absolutely do this. If AI systems can make nice pictures people will hang them in their houses. And they can make nice pictures.

> People aren't going to hang AI paintings in their houses or listen to AI music.

A lot of people are very confident about this and I dont understand why. The same was said for jazz and comic books. But I am listening to jazz with comic book posters on my wall. There were different reasons to give the same statement, but it almost always turns out to be wrong. Humans like what they like and seldom judge an artwork for its process (outside of a very small niche community).

This is something different entirely. We're outside of the "human sphere" so to speak.

>Humans like what they like and seldom judge an artwork for its process (outside of a very small niche community).

That's true, but how do you zoom out of process? This is beyond process. I would just say most people don't like inhuman things.

It's a non-human algorithmic mish mash of a bunch of stuff, there is no human quality to it or years of effort to reach new heights. AI will not make "new" music in the sense that it will make a trumpet song that escapes our current understanding of a trumpets limit like how a once in a generation player will come along and move the ceiling up.

It's an omellete. There is no Dolly Parton behind an AI Jolene or a Michael Jackson turning a 4 track tape into a musical masterpiece. The journey and personalities are what contextualize the sound, without AI that context is gone. That's why I think it will just be used for cafes and things like that where they want to escape licensing fees.

As for consumers - I believe people will see AI music consumption as a way of supporting the new technological powers that be, and the act of listening to human-made music will have an element of counter-culture baked into it. I'm a professional musician and I have a very physical reaction to sound. Once I know it's AI my goosebumps fade.

Another lame incarnation of a tech that will also fade like crypto and everything else. The types of personalities who will leverage this tech are not the same personalities that make the greats.

I'm not worried.

In this post, I can summarize two points you are trying to make. One, it takes less effort, and two it doesn't fit into our current understanding of how art creation narratives work. I don't see how that precludes a piece from being good/bad. I feel like you are arguing for your personal opinion (if not your image of what the world should be) as if it's some kind of objective truth. Your goosebumps might have faded but when I heard this post in a half sleepy state, I got goosebumps when my sleepy mind figured out its fully AI generated. But that doesn't add to the argument either way.

AI Art is second-order human art. From this viewpoint it's still human by proxy.

And anyway, is it measurably different from art produced while tripping on LSD or in similar states of altered consciousness, such as schizophrenia, dementia, or even depression, which often produce things many people would not describe as regular?

I love art made by non-human intelligences. I especially love how it can transcend and redefine loved mediums by combining them in surreal ways that are otherwise quite difficult to obtain. Algorithmic exploration of mediums outpaces mere mortal "effort" in its efficiency and in doing so raises the bar for what constitutes media worth giving our attention to.

I see this take often, but I don't buy it. Mixtapes and playlists are quintessential gifts of affection based on art that the giver did not make and by artists the receiver often does not know. Just the same lots of people hang costco paintings on their walls by anonymous sweatshop workers and kids love cool posters about whatever interests them with no regard to who made them. I believe consumers are likely to enjoy lots of this generated art.

Why not? Have you seen the top 10? It couldn’t be any worse than what it is now. People who reach the top 10 are rarely there for the “art”. A lot of them don’t even write their own songs or music.

All music at this point is largely ambient music and Muzak.

The future is obviously a form of custom AI Muzak/Ambient music with a few pop stars for people to focus on.

I am a big fan of more art type music and guess what? No one listens to it. My fav album of 2023 has 6.4k views on youtube. At least a 100 of those are mine. No one listens to this stuff. People watch video critic reviews of more art type music than the actual music itself.

Lol, same. A lot of the stuff I listen to is completely unknown to a “normal” person. And guess what? AI is not replacing those folks for their audiences in the foreseeable future, because they don’t just regurgitate the same chord progression as everyone else

Honestly I don't even know what the "top 10" is or how its measured and have never met anyone in my life who listened to top 10 stuff. It's always HR office radio, mechanic radio, the bar, club, etc. Even the most normal people find stuff they like on youtube and listen to that.

Even if the AI music is extremely good, it's just missing the fact that it was made by a person, which changes the experience entirely. I think we're more likely to see musicians and those top10 artists leverage AI without explicitly saying so.

I expect we will have a daft punk moment where someone is using exclusively AI and later unmasks that it was all AI, and as soon as that happens the music is disconnected.

Same with AI art. I can see something and be duped and go "oh wow!!!" and as soon as I know it's AI the caring leaves my body completely and reverence and interest is lost.

I love this sentiment about "top 10" radio. If only it was so. That's the stuff that's on everywhere, all the time. Grocery stores, cafes, etc etc. hell, I listen to it on YouTube. It's like junk food. It's bad, it's good.

It's better than AI, even this incredible mindblowing suno thing. Production value counts.

Quality isn’t the only factor though. Music made by people has copyright which means grocery stores and coffee shops have to pay a license fee.

There’s certainly a point where this synthetic music gets good enough to replace the elevator music Muzak crap that they have to pay $2000 to license.

I think it's a lot better at classical, orchestral, and instrumental music than it is at anything requiring vocalization. I created this in less than 20 minutes: https://app.suno.ai/song/eb93c25b-bdbe-4c9f-8e03-66e9479c869...

I need to stem it, fix it up a bit, and remix for stereo in a DAW but it's much better than I expected for my first ever piece of music. Obviously it'd take a lot of work to create a Hans Zimmer level OST from the tool but IMO it wouldn't feel out of place on a Ludovico Einaudi album or on some Spotify or Pandora classical radio.

That's actually a very good piece. Like something I'd hear on late night Paradise Radio. If I was creating an indie movie on no budget I'd be all over this technology for the soundtrack.

I don't think musicians and composers are going to disappear as a consequence of this technology, in the same way that theatre actors were not made obsolete by film. What I do think is that a whole new category of professionals will be created - musicians and composers who get paid to train AI models. I bet it will pay better then the laughable amounts that are streaming royalties.

At some point in the future, wanting no part in AI-generated content is going to be like that old Onion headline. "Area Man Constantly Mentioning That He Doesn't Even Own A Television".

Someone is writing it. There are a lot more than 10 people that want to be in the top 10. It's hard to get into the top 10. You might not appreciate it as art, but the songs that are there are good at something. You could call it being catchy. AI is not even close on this metric.

I think it will get there, wherever "there" is. I think it's very impressive now, as a technical marvel. But it's really not competing with the best humans yet. I don't say this to dismiss it. I say this as an appreciator of music who is neutral on AI. Probably one day I'll listen to mostly AI generated music. But it won't be this month.

Lol, nice!

When they allowed longer text inputs, and for faster rapping, I can really see this kinda thing taking off with L1s and med students.

Like the Animaniacs song about the state capitols.

Or like a Homeric epic that is meant for remembering and singing.

The method of loci may have a new competitor as a way to remember things here.

I suppose the focus was on voice synthesis here. I won't add anything about it since other commenters have already said significant things about this wonderful feat.

Musically, however, I can't help but notice that these models are still very far from being able to generate something interesting: from harmony, to tempo, to musical structure, to dynamics, everything is muddled and without structure. I guess there is still very much to work on, and I am not sure that purely generative models can attain higher levels. Maybe a mixed rule-based and generative approach would do?

The progress is really fast in this field, I really do not know.

To my knowledge, the model being used for this is "chirp" which is 'based on' bark[1], an AI text to speech model.

The github page for bark links to a page about chirp, which returns a 404 page for me [2]. My guess is that the model used for suno.ai's song generator isn't too much different than the text to speech model.

I also have a hunch is that it was more like a coincidence than intentional that the bark model was capable of producing music, and that was spun off into this product.

Unfortunately, there seems to still be issues with bark when generating long (like book length) spoken audio. Which is too bad, as someone who's worked jobs that require lots of driving, it would be awesome to be able to have any text read to me in a natural sounding voice.

[1]https://github.com/suno-ai/bark [2] https://www.suno.ai/examples/chirp-v1

I'll try to give a serious answer, even if I suppose yours was a nice joke :)

Music is a language, even if with no semantic. It has conventions, dialects, a syntax, a grammar. There are multiple dimensions a musician uses to convey what he wants/feels: just like an actor has to control at the same time its voice, posture, interplay with other actors, so a good musician is aware of the structure of the piece he is composing/executing, the relations between the various subparts, how the musical discourse progresses in time, besides agogic, dynamics, sound color.

All of those aspects are continually perpetually compared against the conventions of the genre, mixed, evolved, strictly followed or balatantly negated.

This is something that normally a professional musician takes decades to master (apart from musical geniuses).

A listener takes less time to educate himself to appreciate those nuances (but not too little: let's say ~years). Once you develop a taste, it becomes very obvious to see through the spectrum that goes from bad quality tunes to musical artistry.

I see nothing musically interesting in this (wonderful) PoC of speech synthesis.

Just to be clear: I did not see anything particularly stunning even in Google's Bach Doodle from some years ago https://doodles.google/doodle/celebrating-johann-sebastian-b...

All the AI generated music just sounds like someone jamming without any hint of any real melody, original or a cover. It's very strange to listen to. It sounds exactly like an AI generated photo of a person looks like. Looks/sounds kinda real until you look/listen closer.

Reminds me a little bit of Catholic mass when the priest "sings" some of the sections. There is no consistency, no cadence, but their voice goes up and down. It's high-effort talking.

I wonder if these models would do something better if the text were poetic or punctuated differently.

In the great web tradition of harvesting the vast body of other people's work in the large[1] and shoving it through huge amounts of computation to wring out a nickel's worth of value that will eventually manifest in some good-paying SWE jobs, a rich executive class, and a whole lot of shareholder value and inevitably mutate in another goddamn ad-serving platform.

[1] Ha, the poor millions of dumb minions who put their work on the web thinking it might be fun for others or garner themselves a small following, they didn't check the terms of the EULA!

These kinds of discussions always leaves me wondering if people consider how actual humans learn their craft, constantly studying and mimicking others. Inspiration is to use existing experiences however mixed together, while originality comes from an input or an experience that others have yet to use.

"Write a sad song about the MIT license" is certainly such new input, and if I was commissioned to write the song it would be based on inspiration (i.e., "use training on") music I have heard or studied. And yes, none of the musicians I have listened to or have studied will benefit from the endless money fountain I'd acquire from composing such song.

In the case of a human studying, a person puts in effort and gets rewarded for their efforts.

In the case of AI, a person puts in minimal effort to generate something that devalues the work of all the people who did put in effort.

> In the case of a human studying, a person puts in effort and gets rewarded for their efforts.

When someone needs something composed, they don't learn how to write music. They pay someone else the bare minimum, e.g. a few bucks on fiverr. The person will spend the least possible amount of effort to try to make their life go around with the little money they got.

When you then use an AI model, the work done for those five bucks is replaced by work done for almost free.

Neither the person you would hire or the AI credited those who created the material they trained on.

> When someone needs something composed, they don’t learn to write music…

Speak for yourself! There is only one thing that scares me more than composing music, and that’s paying somebody a few bucks in fiverr to do it for me.

Despite your personal fears I believe I spoke for the vast majority of cases rather than just for myself.

Although I suppose royalty-free stock music is the norm nowadays for most commercial uses, which takes it a step further, anonymizing the composer entirely...

> In the case of AI, a person puts in minimal effort to generate something that devalues the work of all the people who did put in effort.

Worded differently: people who couldn't otherwise produce skill-based works of value have had the barrier of entry lowered for that specific medium of expression, allowing for more works across a wider spectrum of skill.

It’s so bizarre when people say stuff like this. There is absolutely nothing preventing the unpracticed or untalented people from any form of creative expression. What instead people who use AI seem to want is for unpracticed or untalented people to perform at the level of the practiced and talented, but this is no net gain to anyone. Why? Because only a rare subset of people who ARE practiced and talented create anything of interest or value in the first place. What this tells you is that skill or level of performance is not the barrier, but a means through which great things CAN be achieved (i.e. necessary, but not sufficient)

Flooding the world with unpolished, unpracticed works, AI-tuned to the level of being mediocre, is a creative and intellectual dead end.

> for unpracticed or untalented people to perform at the level of the practiced and talented

This is what tools are.

Cheap digital tablets have done away with the need for expensive consumables. You can just download a different brush style instead of learning a physical technique. No waiting for paint to dry or smudged pencils. The barrier to entry for painting has dropped to a one time investment of like a hundred bucks. Almost nobody mixes their own paint, nor stretches their own canvas. Those skills aren't needed anymore.

It's possible to build very precise machine parts by hand. It's very difficult and requires great skill, so nobody does that. Some do and are admired for it, but everybody else uses precise machines to make precise parts with nearly no effort.

It's just a tool. Only difference is that we had assumed art would never be automatable.

Objectively, I don't think this is a bad thing. It doesn't change the subjective value of art any more than the average cartoonist devalues the Mona Lisa. It's just a new form of art, there will always be people mixing their own paints and stretching their own canvas, just as there always has been.

It's only a problem because in our society you either have a job or you starve. No one can afford to be an artist. Those that do tend to grind out as many pieces as fast as they can so they can pay the goddamn rent. If not for that, these AI tools would be pretty cool.

I think the bizarrity arises from the following differences in beliefs:

* That "_any_ form of creative expression" is a viable creative substitute for people wanting to create in a _specific_ medium of creative expression -- especially those that had a high barrier of technical skills required to be seen as "good enough" to share.

* That a person who has an idea for art will put in the necessary time to become proficient enough to create that "good enough" art through traditional means (IMO demonstrably incorrect), and that is preferred over that person just not expressing a lower-quality version of that idea at all.

* That those who use AI primarily want or expect to "perform at the level of the practiced and talented" (i.e. top-tier art) rather than using it to produce art they otherwise couldn't have, even at low- and mid-level qualities.

* That there is no skill or talent in using AI tools to produce art (or that the skill or talent using AI tools is meant to be a full replacement for traditional artistic skills or talents).

FWIW, I'm a long-time sketch artist and acrylics painter (~20 years). There are many mediums, subjects, and styles that I'm not good at -- and I enjoy using AI to express myself in those areas (and have also liked using AI to create songs to show to my more musicially-adept wife...). But even in my own wheelhouse (landscapes and still life), I also often use AI to brainstorm composition, perspective, colors, textures, lighting, etc. It's a great tool for experts to lean on, but an even better tool for non-artists who couldn't or wouldn't otherwise share their art.

Indeed. As an amateur guitarist, but a professional virtual machinist, I have a ton of respect for people who have dedicated their whole lives to mastery in any one particular area. To have a machine gulp down untold eons of human exertion and then barf out soulless mimicry, no matter how jaw-dropping of a feat of engineering behind it, and then mint no-talent ass clowns by the million because viral videos make an awesome advertising platform--it's just some kind of dystopian peak tech, except the dystopia is mildly amusing rather than a disappointing and jarring marginalization, flippant dismissal of all of us.

This feels like weird gatekeeping.

Why is this the line? Where are the complaints about people using pianos to achieve rather precise notes instead of using their own voices? They are just untalented at singing and their use of any tool to create sound is of no net gain to anyone.

This person: https://www.youtube.com/watch?v=IbUE-LxhUR8 ? They're recording and playing back on a loop! They should record full repeated playings, any use of the recording is of no net gain because it could be achieved otherwise.

Songwriters? If they write lyrics and someone else sings them the result should be cast into the sea - it's of no net gain to anyone because they did not create the sounds themselves.

Composers? Frankly pointless.

> Flooding the world with unpolished, unpracticed works

I hate to break it to you but there are a vast number of terrible works of art out there already.

> What this tells you is that skill or level of performance is not the barrier, but a means through which great things CAN be achieved (i.e. necessary, but not sufficient)

If it's a necessary thing, of course it's a barrier. That there are two barriers doesn't change that.

Using the skills they presumably developed listening to and copying other singers and studying music, with an instrument built from roughly the same instructions as everyone else.

That a person can't sound like the weighted average is human limitation (although with modern pop people do get quite close!), not because new singers aren't trying to. That of course adds variation that we appreciate, but doesn't change the underlying similarity in how acquired skill is mimicry of those who acquired it before us - with very rare exceptions.

No, sounding like the genre-weighted average of Spotify simply isn't what singers try to do. They haven't listened to that much music, they have actual preferences, they have natural qualities to their voice which they're complimented on or asked to mask, and they're trying to hit notes based on their aural perception of harmony and related theoretical principles not based on the waveforms of other songs involving singer songwriters. The fact that they literally couldn't do what NNs do even if they wanted to also seems quite relevant to the fact that they don't do what NNs do.

What next, are we going to argue that what programmers creating new programs are really trying to do is generate a prompt-weighted average of the bytecode of every program they've ever downloaded, and all that business analysis and functional spec and use of high level programming languages and expressed preferences for coding standards is irrelevant?

> they have actual preferences

That's just a bias.

> natural qualities to their voice

That's the physical limitations I referred to, which isn't something humans tend to be happy about but can sometimes end up being a differentiating benefit.

> What next, are we going to argue that what programmers creating new programs are really trying to do is generate a prompt-weighted average of the bytecode of every program they've ever downloaded

That's a horrible strawman. Do you as programmer often read and write bytecode directly?

> That's just a bias.

I'm beginning to assume you're an LLM, because I'm not convinced a human would honestly try to argue that their emotional reaction to their favourite songs is basically equivalent to flipping the values of some bits to ensure that they generate music more similarly to them.

> That's a horrible strawman. Do you as programmer often read and write bytecode directly?

As an improvising guitarist (even a very mediocre one) my creative process is even further removed from an LLM parsing and generating sound files directly....

I suspect the issue here is just the assumption that LLMs are "just flipping some bits", while simultaneously putting humanity on some unreachable pedestal.

We are all nothing but a horde of molecular machines. Your "you" is just individual neurons reacting to input in accordance to their current chemical state and active connections. All your experiences, unique personality treats, and creativity you add to the process is solely the result of the current state of your network of neurons in response to a particular input.

But while an LLM is trained once and then has its state fixed in place regardless of input, we "train" continously, and while an LLM might have experience of an inhuman corpus for a certain subject, we have many "irrelevant" experiences to mix things up.

Your "prompt" is also messy, including the current sound of your own heartbeat, the remaining taste in your mouth from your last meal, the feeling of a breeze through your hair as it tickles your neck, while the LLM has just one, maybe two half-assed sentences. This mix of messy experiences and noisy input fuels "creativity". You don't think "I need to copy XYZ", but neither does the AI. You both just react.

In some regards our chaos is better, in others it is worse. But while the machinery of an LLM still does not even remotely approach a brain, we should not forget that we are nothing but more a cluster of small machines, assembled from roughly 750 MB worth of blueprint.

I wonder if this won't drive a resurgence of demand for live performances - as recording becomes more and more artificial, live performance will mean more. (Or maybe, as a live performer, I'm just wishful thinking here...)

Generally speaking, people create internet content so that it is shared.

All of the creators and subjects of meme formats... Should they receive royalty every time you post some inane mashup?

People also differentiate heavily on the basis of scale and profit. Artists are often fine with people sharing their posts and may even tolerate someone asking for permission to make printouts or whatever else for their circle of friends, but will expect some sort of royalty if you're asking to be able to sell prints of their artwork on a store.

Hell, even with viral videos it's relatively common that normal people can share away while entertainment companies and influencers are expected to pay for a license.

With memes it isn't clear exactly who made the first template, and the creation of them doesn't revolve around specific people in the same way, nor are they meaningfully tied to profits.

When creators post their content online to be shared, they do it with the focus being on reaching individuals, not for it to be sucked up by soulless companies to extract all value without the intention of giving back.

> With memes it isn't clear exactly who made the first template.

The Office, The Matrix, Lord of the Rings, Django Unchained, Game of Thrones, etc

These works have identifiable creators.

The conversation is quickly devolving into a vacuum of ignorance where things like royalties, fair use policies, revenue-sharing agreements, parodies, sampling, etc, have apparently never been thought about.

We're not talking about any of those things. We're talking about wholesale digestion of the entirety of human knowledge by automated means, which is now not just theoretically possible, but routine.

This is not that. We're not talking about some inane mashup, but a wholesale digestion of every creative thing any person ever did by a monster computer cluster whose scale dwarfs imagination, which then promptly uses it to maximize "engagement" to gather eyeballs to feed them advertising. It's profoundly messed up.

> which then promptly uses it to maximize "engagement" to gather eyeballs to feed them advertising.

This is the real problem, right? People don't dislike generative AI, they dislike the attention economy. Yet I see more disgust towards AI than the company policies which suck. I don't understand why.

I think it is more that art, film and music have largely been replaced with complaining online about various subjects as the major form of entertainment in America.

Oh, haha, yeah. I guess I'm the opposite--I actually like AI more than the attention economy! At least one of them is not actively trying to drain my brainpower and skill set and get my to buy stuff and do stuff I wouldn't otherwise buy or do.

It isn't. If a serial killer spent a week digging mass graves by hand, they don't get years taken off their sentence. You don't get points just for working hard or spending money, particularly when it cheapens or just appropriates other people's work.

AI isn't influenced. It doesn't have restrictions. It doesn't have to work within confines. AI can always remember the word it wants to use. It always can hit the note it intends. And it can hit every note. Etc. It uses the corpus of training data and mashes it into a new form.

Stephen King won't be able to remember every word of every story he's ever read. And if he wants to make something "Lovecraftian", it'll be what Stephen King thinks is Lovecraftian. And there will be something to that. Some bit he believes is more or less important than other people And those bits are what makes Stephen King, Stephen King.

Everyone has had access to the same material King read. Access to the same tools he used to create. Everyone had the chance to effectively be Stephen King. But there is just one. Because there is some unique bit of observation or recall or combination of such things that is unique to King.

And from what I've seen so far, these LLMs can't do that. There is a missing element of pure imagination.

How am I doing that? I am claiming that LLMs lack imagination. They are incapable of creating out of whole cloth or interpretation.

Saying they cannot create based off of a vague suggestion is very much in line with that claim. I consider it a vital difference between Stephen King being inspired and LLMs mashing training inputs together.

This is scratching an rare itch for me because I am a heavy subvocalizer when I read just about anything, and when I have a song stuck in my head, I end up wondering what it would sound like if someone sang the words I’m reading to the tune of the song.

for college i would convert my physical textbooks to wordfile text, then convert the wordfile text to computer voice mp3s and use those to play in the background to help me studying.

break up chapters or sections of the college textbook into suno songs instead - itd be maad interesting how much better that wouldve helped my studies. monotone computer voices of 10+ years ago will put you to sleep.

I'm impressed how it managed to extract rhyme from that license.

    The software is provided (as is)
    without warranty, of any kind
    express, or implied.

Reminds me of Regina Spektor's style.

And some of the generated phenomes actually just sound like stylistic auto-tuning. I kind of like it.

I'm sure many have already observed this, but I think the thing that most artists fear from AI is not that AI will be able to produce works on parr or superior to human works, but that most people won't care enough to value the difference.

It's a satire generator. Take any text you want to make fun of, turn it into music.

I'm not sure whether I've just run out of credit, or Suno actually knows what the political sensitivities of the text might be, but I can't generate a second amendment song.

This is impressive, but part of what makes it so is that we are not used to it. As these kinds of AI-generated music/images/videos become ubiquitous, it will be the new normal and they will become less impressive.

Maybe, but I think there is something innately funny about making computers say silly things. As a small child it was peak comedy to me making a Macintosh say “fart” and it’s still funny to me when a computer sings the MIT license.

I was wondering if she would sing really loud at the ALL CAPS sections, but fortunately she did not. Still better than most Eurovision Contest songs :)

My approach to generating a music video was to generate scenes using DALL-E 3, and then animate those using Stable Video Diffusion (SVD).

SVD doesn't have well-controllable motion and is utterly blown out of the water by Sora, but it's what we have right now.

Here's the resulting vid, "a death metal song about a macro photographer":

https://www.youtube.com/watch?v=kNVRQ1Zg-a0

If you only want a video file from Suno to share with the default static lyrics screen on it, hit Download Video from the three-dots menu.

I generated a song in 30 seconds from getting on the site, and generated a song that is crazy relatable, funny and sounds good. Made the whole family smile. This is going places.

This is so much better than stable.audio released yesterday!?

I've dabbled in music production and this is just unbelievable.

Both amazing and a bit sad because this is already so much better than would i would have anticipated.

First illustrators, copywriters, then VFX guys, and now music. We're going to loose so many jobs in the creative sector right?

Going to make those boring textbooks sound more tolerable. Interesting implications for education. If this was a foreign language I didn't understand, I don't think I would have been able to tell it was generated.

Wow! I hadn't kept up with music generation for the past few years. It's come a long way!

Long-term coherence, reasonable-ish melody, all on top of very unmusical text. Very impressive.

Suno.AI is very fun. I find that asking ChatGPT to create lyrics and then feeding it gives some great results, although half the generations tend to have a bit too much static, so you have to keep generating.

And does the requirement that "this permission notice shall be included in all copies or substantial portions of the Software" mean that the mpeg specifically must be included?

Yes:

    The cake is a lie, but the music is real

    It’s all fake when the truth is revealed

The autotune electronic voice seems likely styled on GLaDOS.

Suno is pretty cool. If I had to guess this uses Suno's Bark and Facebook's MusicGen? The output of the latter is used as conditional layers for the prior similar to ControlNet?

Anyway, what will be interesting is when this can be done locally on consumer hardware with open-source AI, a nice UI and Vulkan/DirectML GPU inference.

I remember when Solaria came out there were a ton of people making emotional spiritual music with it. It felt so odd, robot voices singing to God and about the wonder of experiencing life. Sounded pretty though.

Soon we will have 'preacher's in a box' that will sing to lift you up, mentor you, guide you through life. Most will even be 'non-religious' but will basically become your religion, your guide through life.

most impressive ive heard on suno was a live performance. all the live performance cliches including the crowd singing along acapella. it was unfuckingbelievable - and at the same time i can see how that can get burned out real quick by others replicating same idea over and over.

Is there any information how such songs are made? It probably is way more complicated to get a decent result than one might expect.

Here I was thinking "game with music that talks about your game". Think something like skyrim where you have nords singing ballads based on what you've done, and what your character is named (built with an llm generating lyrics from a json config the game spits out, which is in turn fed into this).

Your version unfortunately sounds much more plausible and profitable.

Twitter is stupid now, so I can only see the linked post. But are there instructions to replicate this, and has anyone done so? Just kind of skeptical of videos of demos in general.

If this is legit, the Spotify spam is going to become atrocious and probably unmanageable.

Yeah, it works, and doesn't need any technical instructions. Just go make a song on suno.ai.

I had done a folk song version of my resume. It wasn't going to become a hit or anything, so I don't see this replacing any real musicians, but it absolutely worked to create a passable performance as a song.

The Spotify spam _is_ already atrocious and unmanageable. If anything, it might get a little bit more creative instead of people just publishing the same samples from Splice everywhere.

To get chorus right I’m not sure if LLM type tech can accurately repeat the chorus it has made up before. Many songs have very repeated chorus. An example is U2s One. “Is it getting better..” and then another chorus “did I disappoint you..”

Current generated songs are made like sentences where you hear entire song without much structure

Suno.ai and the underlaying technologies are really quite amazing. I've done a few things like:

* Put a poem my late mother wrote to music for her memorial

* versions of 80's new wave songs

and they came out so lovely compared to what I'd be capable of as a musician, but puts me in the role of a "producer" of sorts tuning the sound and vibe. Really well worth the money.

（评论） (comments)

（评论）
(comments)