（评论）

（评论）
(comments)

原始链接: https://news.ycombinator.com/item?id=39992817

人工智能音乐生成公司SonAuto推出了首款产品1.0版本。与在训练语言模型音乐生成器之前使用矢量量化变分自动编码器将音乐转换为标记的竞争对手不同，SonAuto 采用了正则变分自动编码器瓶颈并进行了重大修改，从而实现了令人印象深刻的压缩率。他们的创新导致创建了扩散变压器，这是第一个能够生成同步歌词的音频扩散模型。用户可以通过节奏控制等功能来操纵他们的创作，允许自定义打击乐线或设置节奏。未来的升级将包括现有组合的变化。尽管开发成本高昂，但 SonAuto 的扩散模型由于自行构建的推理基础设施而提供较低的服务成本。他们网站上免费、无限的音乐生成鼓励音乐家探索，而不必担心被替换。查看 sonauto.ai/songs 上的示例。有任何疑问吗？联系松奥汽车。

Hey HN,

My cofounder and I trained an AI music generation model and after a month of testing we're launching 1.0 today. Ours is interesting because it's a latent diffusion model instead of a language model, which makes it more controllable: https://sonauto.ai/

Others do music generation by training a Vector Quantized Variational Autoencoder like Descript Audio Codec (https://github.com/descriptinc/descript-audio-codec) to turn music into tokens, then training an LLM on those tokens. Instead, we ripped the tokenization part off and replaced it with a normal variational autoencoder bottleneck (along with some other important changes to enable insane compression ratios). This gave us a nice, normally distributed latent space on which to train a diffusion transformer (like Sora). Our diffusion model is also particularly interesting because it is the first audio diffusion model to generate coherent lyrics!

We like diffusion models for music generation because they have some interesting properties that make controlling them easier (so you can make your own music instead of just taking what the machine gives you). For example, we have a rhythm control mode where you can upload your own percussion line or set a BPM. Very soon you'll also be able to generate proper variations of an uploaded or previously generated song (e.g., you could even sing into Voice Memos for a minute and upload that!). @Musicians of HN, try uploading your songs and using Rhythm Control/let us know what you think! Our goal is to enable more of you, not replace you.

For example, we turned this drum line (https://sonauto.ai/songs/uoTKycBghUBv7wA2YfNz) into this full song (https://sonauto.ai/songs/KSK7WM1PJuz1euhq6lS7 skip to 1:05 if impatient) or this other song I like better (https://sonauto.ai/songs/qkn3KYv0ICT9kjWTmins - we accidentally compressed it with AAC instead of Opus which hurt quality, though)

We also like diffusion models because while they're expensive to train, they're cheap to serve. We built our own efficient inference infrastructure instead of using those expensive inference as a service startups that are all the rage. That's why we're making generations on our site free and unlimited for as long as possible.

We'd love to answer your questions. Let us know what you think of our first model! https://sonauto.ai/

（评论） (comments)

（评论）
(comments)