（评论）

（评论）
(comments)

原始链接: https://news.ycombinator.com/item?id=41312225

通过在通用通信平台中实现聊天机器人功能，可以实现与人工智能的用户友好交互，而不是要求用户通过单独的应用程序或网站访问系统。这种方法允许客户服务代表 (CSR) 监控用户和机器人之间的交互，并在需要时提供帮助，类似于 Apple 商店等物理演示环境。用户可以观察彼此的行为并互相帮助，从而减少个性化帐户创建过程的需要。通过避免复杂的身份验证系统，例如记住唯一的电子邮件和密码或管理多次登录，可以显着提高整体用户满意度和效率。此外，利用聊天机器人技术可以释放分配给处理与帐户创建相关的支持问题的资源，从而节省宝贵的时间并简化操作。

Ideogram 2.0 was also released today, and it's nerfed (anatomy is lot worse than 1.0 now) just like StableDiffusion versions after 1.5 which is very disappointing.

Well good thing we have Flux out in the open now, both midjourney releasing web version or ideogram releasing there 2.0 on the same day after 2 weeks of flux won't redeem them as much. Flux Dev is amazing, check what SD community is doing with it on https://www.reddit.com/r/StableDiffusion/ . It can do fine tuning, there are Loras now, even control net. It can gen casual photos like no other tool out there, you won't be able to tell they are AI without looking way too deep.

Just to clarify for other readers, Draw Things has support and provides download links to quants but no iOS device can run the full precision model which means you will get slightly different and usually lower quality output than stuff you may see elsewhere, even if you use the same settings. It's still damn impressive though.

Quality issue should be mainly due to using FP16 accmulators for GEMM in M1-M2, A14-A16 devices (it is not a problem for SD v1 / SDXL models due to smaller channel count). This is changed to FP32 accmulators for GEMM in these devices with 1.20240820.1 release. q8p should have comparable quality to non-quantized models (in Draw Things, it is called FLUX.1 [dev] (Exact)).

Claims that quantization doesn’t hurt models are made all the time but rely on the fact that almost all evaluations today of LLMs hardly scratch their surface. If we evaluated LLMs properly, even large quants would be detectably worse, and by a significant amount.

A model trained on BF16 that within the range of FP16 have effective bit rate of 13-bit at max (e5m7). Reasonable quantization (at 8-bit) gets you weight error (i.e. L2 distance on weights) down to

I think there is a line somewhere between 4-bit to 8-bit that will hurt performance (for both diffusion models and LLM). But I doubt the line is between 8-bit to 13-bit.

(Another case in point: you can use generic lossless compression to get model weights from 13bit down to 11bit by just zip exponent and mantissa separately, that suggests the effective bit rate is lower than 13bit on full-precision model).

I buy these kinds of arguments but the moment a company or NeurIPS researcher claims even a "tiny" bit of loss happens, I become suspicious. I don't buy most claims of "we get 99%+ of the performance" made in practice.

But yes, I do believe that we will find proper lossless quants, and eventually (for real this time) get "only a little bit of loss" quants, but I don't think that the current 8 bits are there yet.

Also, quantized models often have worse GPU utilization which harms tokens/s if you have the hardware capable to run the unquantized types. It seems to depend on the quant. SD models seem to get faster when quantized, but LLMs are often slower. Very weird.

I wonder if there is a case for thinking about it in terms of the technical definitions:

If we start from peering into quantization, we can show it is by definition lossy, unless every term had no significant bits past the quantization amount.

so our lower bound must that 0.03% error mentioned above.

I don't think this is true, llama.cpp hobbyists think about this a lot and there's been multiple independent community experiments, including blind testings by a crowd. I doubt it holds across models and modalities, but in llama.cpp's context, Q5 is inarguably unnoticeably different from F32.

However this seems to be model size dependent, ex. Llama 3.1 405B is reported to degrade much quicker under quantization

While I won't say that realism is a solved problem, SD has been able to produce unbelievably realistic photo-level images using "Realism Engine"/NightVisionXL/etc for a while now.

Flux's power isn't necessarily in its ability to produce realistic images, so much as its increased size gives it a FAR superior ability to more closely follow the prompt.

Also to 'understand' complicated prompts (however far that goes when talking image models). For example, describing two subjects in detail with a relationship between them ("a cat and a man and the cat is standing on the man's head and the cat has red hair and the man is wearing purple glasses") reliably works with Flux, even the 'light' models, but is a crapshoot with SD.

Which honestly is rather silly for the companies making the models. The cat's already out of the bag, and if a competitor manages to make a similar/good-enough (uncensored) model there's nothing midjourney etc will have going for them.

Or as someone once said, if they took porn off the internet, there'd only be one website left, and it'd be called "Bring Back the Porn!".

Was heavily using StableDiffusion a handful of months ago. Seemed like new achievements were being made all the time.

What happened with 1.5 and newer? I’m out of the loop.

SD 1.5 was not censored. Later version were trained with the the censored data. A side of that was bad anatomy. Search for stable diffusion woman lying on grass. I think this was SD3. It generates monstrosities when you ask for a woman on grass.

I liked ideogram's approach. It looked like their training data was not censored as much (it still didn't render privates). The generated images are checked are tested for nudity before presenting to user, if found, image is replaced by a placeholder.

If you check SD reddit, community seems to have jumped ship to flux. It has great prompt adherence. You don't have to employ silly tricks to get what you want. Ideogram also has amazing prompt adherence.

We did have SDXL in-between, and while that one was also censored, the censoring has been long since removed in fine-tunes.

Flux is also censored, but in a way that doesn't usually break anatomy. It took about a week before people figured out how to start decensoring it.

This is a interesting company to watch in the Gen AI space since they don't have all the same restrictions as the bigger companies.

Crazy this took them so long, and also crazy that they got so far through a very confusing Discord experience.

Haha, that was the craziest part about signing on midjourney, when you have to sign up via Discord

I don't use Discord on my office laptop, and that was very odd experience

Yep - they self-censor themselves according to China's whims, just so they can have China access now only to be banned by the great firewall 1 year from now after Chines startups scrape all their image outputs for training.

> space since they don't have all the same restrictions as the bigger companies.

Can you give an example? Midjourney is heavily censored so it seems like it has a lot of restrictions.

try generating with the prompt "a spoony bard and a licentious howler", if you're willing to catch a ban for using ted woolsey's horrible, offensive language that was acceptable to 1990s nintendo of america

Maybe they got so far thanks to the Discord approach.

When you went to the Discord, you immediately got the endless stream of great looking stuff other people were generating. This was quite powerful way to show what is possible.

I bet one challenge for getting new users engaged with these tools is that you try to generate something, get bad results, get disappointed and never come back.

That's exactly true, and having built a similar bot for friends and acquaintances to use, the effect in question is huge. It makes no sense to go for a webapp first, second or even third.

Google, Facebook, Microsoft etc all have to shoehorn these things into their products to stay on top, but it's not their core business and they don't want it to take away from their ads or licenses they sell. Midjourney as a company is much freer to innovate without the burden of the restrictions of an established company.

They did a clever thing, they used Discord and their servers as CDN

No hosting of the generated pictures, just send them via discord message and forget them. No S3 or big cloud lambda functions.

Easy to start to make a minimal working prototype.

That and you get user observability for free, and support injection in a way that to this day there’s no good way to do in an “app” experience.

Presuming your bot requires people to interact with it in public channels, your CSRs can then just sit in those channels watching people using the bot, and step in if they’re struggling. It’s a far more impactful way to leverage support staff than sticking a support interface on your website and hoping people will reach out through it.

It’s actually akin to the benefit of a physically-situated product demo experience, e.g. the product tables at an Apple Store.

And, also like an Apple Store, customers can also watch what one-another are doing/attempting, and so can both learn from one another, and act as informal support for one-another.

What was the confusing Discord experience? Was it that Discord was the main way to access Midjourney, and it was chaotic? I vaguely remember this, but didn't spend much time there.

Can you explain the difference between creating an anonymous google account with fake information compared to an anonymous midjourney account with fake information?

And a phone number in the case of both google and discord.

It is deranged that requiring a phone to access a website is seen as a non-issue, I agree.

Touché. But it's certainly leagues more "everyone" today than it was yesterday. It used to require arcane knowledge of deep Discord server bot navigation. I gave up after 20 minutes and never figured it out. Today I tried it in seconds.

I have about 40+ google accounts for reasons, so I don't begin to understand the aversion some have to registering a burner google/discord/facebook/etc account under their hampster's name, but many of my closest friends are just like you so I respect it anyway, whatever principle it is.

> I don't begin to understand the aversion some have to registering a burner google/discord/Facebook/etc

Maybe because several of those tend to ask for phone verification nowadays, and phone burner services tend to either not work or look so shady it looks like a major gamble to give them any payment information?

Do you use all 20 regularly? Because if you don’t, there’s a chance that next time you try to login to one of them on the web they’ll ask you to add a phone number and not let you continue until you do. But if you have them setup in an email client, it should still work.

I feel like this is an old man yells at cloud moment but I refuse to link social media or google accounts with other services.

Why should Google's or Discord's policy dictate my participation in the web?

They probably don’t want to deal with email registration workflows and want to limit themselves to OAuth, but that’s not user/customer-friendly.

Why not just make another google account that you solely use for registration in services like these? Use a virtual machine with VPN if you really do not want it to be linked with your real account.

It is a bit of extra work but that's just how it is nowadays.

Does not google require a telephone number to create accounts nowadays? In some regions there are no anonymous SIM cards by law. (Temporary number services may not work well.)

The issue is you have to disclose your phone number to Google, not (just) to Midjourney, AND Google will know you use Midjourney.

There's too much unnecessary connected PII data generated by such mechanisms.

It is very much possible to create temporary credit cards linked to your real bank account for one-off purchases. Apple Card provides that as a service (US only) and other countries have similar systems that every bank adheres to.

Plus, you may not mind not being anonymous to Midjourney but mind not being anonymous to some other service (like Google).

There do exist anonymous credit cards, paid with cash, for fixed relatively small amounts (e.g. ~100u. In Europe there are restrictions to these kind of payment methods - cards must be below, I believe, 150€).

I think the only way to make a google account without a phone number these days is to factory reset an android phone and take the sim out beforehand.

If that even still works...

Because the alternative is pretty provably worse for you, and for them.

* You have to save your (hopefully unique!) email/password in a password manager which is effectively contradictory to your "I won't use a cloud service" argument.

* The company needs to build out a whole email/password authentication flow, including forgetting your password, resetting your password, hints, rate limiting, etc etc, all things that Google/Apple have entire dedicated engineering teams tackling; alternatively, there are solid drop-in OAuth libraries for every major language out there.

* Most people do not want to manage passwords and so take the absolute lazy route of reusing their passwords across all kinds of services. This winds up with breached accounts because Joe Smith decided to use his LinkedIn email/password on Midjourney.

* People have multiple email addresses and as a result wind up forgetting which email address/password they used for a given site.

Auth is the number one customer service problem on almost any service out there. When you look at the sheer number of tickets, auth failures and handholding always dominate time spent helping customers, and it isn't close. If Midjourney alienates 1 potential customer out of 100, but the other 99 have an easier sign-in experience and don't have to worry about any of the above, that is an absolute win.

All very thoughtful arguments but I think this solution to these problems is flawed. I don't believe we should be solving authentication management problems by handing over all authentication capabilities and responsibilities to one or two mega companies.

Especially since those companies can wield this enormous power by removing my access to this service because I may or may not have violated a policy unrelated to this service.

There has to be a better way.

While we are all waiting for the world to sort these problems out, companies that are not interested in solving them for the world will continue to use SSO techniques.

I’m very not impressed by this deep, extended critique of machine learning researchers using common security best practices on the grounds that those practices involve an imperfect user experience for those requiring perfect anonymity…

there's web3, where users have a private key and the public key is on a cryptocurrency chain, but adoption there has been slow. there's also a number of problems with that approach, but that's the other option on the table.

I want to believe, but sadly there's no market for it. unless someone wants to start a privacy minded alternative to auth0, and figure out a business model that works , which is to say, are you willing to pay for this better way? are there enough other people willing to pay a company for privacy to make it a lucrative worthwhile business? because users are trained to think that software should be free-as-in-beer but unfortunately, developing software is expensive and those costs have to be recouped somehow. people say they want to pay, but revealed preferences are they don't.

All of what you say may well be true, and yet: everyone else on the internet manages to make it work.

Password managers don't have to be cloud services. The user gets to choose.

Some of us like to use a different email address for each account on purpose.

You make it sound untenable to support email/password auth, but given that the vast majority of low tech and high tech online services manage it just fine, I think you might be exaggerating a bit. Since Midjourney is already outsourcing their auth flow, they could just as easily use a third party that supports the most common form of account creation.

> People have multiple email addresses and as a result wind up forgetting which email address/password they used for a given site.

Effectively you mean that people have multiple Google accounts?

While I understand that they might need some account or token to stop abuse, just having my sign in is a big no for anything I just want to try out. After the whole social media trend more or less collapsed into a privacy invading data collection nightmare, I've been extremely reluctant to sign up for anything.

You honestly think giving full, no rate limited free access is viable in the LLM space, where each execution is actually pretty expensive?

Because that's what you're asking for, as it'd be trivial to reset the rate without accounts.

No no, I understand that it's not a viable financial option, I just don't understand that anyone would trust any of these companies with any sort of information.

Most of the current AI companies are going to fail or get bought up, so you have no idea where that account information, all your prompts and all the answers will eventually go. After social media, I don't really see any reason why anyone would trust any tech company with any sort of information, unless you really really have to. If I want to run an LLM, then I'll get one that can run on my own computer, even if that mean forgoing certain features.

go try it right now lol

i assume they still have ip/browser fingerprint based rate limiting, but you can type in a prompt and get an answer with zero login or anything else.

Are you sure that's true?

https://github.com/black-forest-labs/flux/blob/main/model_li...

From the license: "We claim no ownership rights in and to the Outputs. You are solely responsible for the Outputs you generate and their subsequent uses in accordance with this License. You may use Output for any purpose (including for commercial purposes), except as expressly prohibited herein. You may not use the Output to train, fine-tune or distill a model that is competitive with the FLUX.1 [dev] Model."

This license seems to indicate that the images from the dev model CAN be used for commercial purposes outside of using those images to train derivative models. It would be a little weird to me that they'd allow you to use FluxDev images for commercial purposes IF AND ONLY IF the model host was Replicate.

Flux seems to give the backgrounds even more detail and coherence compared to MJ, it's a surprisingly (or maybe not given its size) great model.

That is the base style. You can get much different image qualities adding different types of film, cameras, etc to the prompt.

With that said, I loved Midjourney a year ago but I am at the point I have seen enough AI art to last several lives.

AI art reminds me of eating wasabi. It is absolutely amazing at first but I quickly get totally sick of it.

For a period I daily checked its best outputs page: the "pixar" style was frequent, but far from being the only one. "Typical" like 10 is the type (mode) in 10+9+9+8+8+7+7+6+6+5+5+4+4+3+3+2+2+1+1 - but still 10% of the whole and just one possibility of all.

Midlibrary.io recognizes 5500 styles Midjourney knows.

Yup it's an arbitrary metric, but I tried cajoling various image models into generating spork pictures with highly detailed descriptions (I have ComfyUI & AUTOMATIC1111 locally, and many models), which lead to me creating the site.

I'd say a better test for adherence is how well a model does when the detailed description falls in between two very well known concepts - it's kinda like those pictures from the 1500s of exotic animals seen by explorers drawn by people using only the field notes after a long voyage back.

Here's a test against the full Flux Dev model where we try to describe the physical appearance of a spork without referencing it by name.

Prompt: A hybrid kitchen utensil that looks like a spoon with small tine prongs of a fork made of metal, realistic photo

https://gondolaprime.pw/pictures/flux-spork-like.jpg

The combination of T5 / clip coupled with a much larger model means there's less need to rely on custom LoRAs for unfamiliar concepts which is awesome.

EDIT: If you've got the GPU for it, I'd recommend downloading a copy of the latest version of the SD-WEBUI Forge repo along with the DEV checkpoint of Flux (not schnell). It's super impressive and I get an iteration speed of roughly 15 seconds per 1024x1024 image.

Welll.... there's a hundred ways we could measure prompt adherence everything from:

- Descriptive -> describing a difficult concept that is most certainly NOT in the training data

- Hybrids -> fusions of familiar concepts

- Platonic overrides -> this is my phrase for attempting to see how well you can OVERRIDE very emphasized training data. For example, a zebra with horizontal stripes.

etc. etc.

Haha, I thought it was just me. I was trying to generate a spork for a silly card game and was really frustrated/confused about what the issue was.

I have to say I'm very impressed. I've previously used free generators for generating scenery for my D&D campaign, and running a prompt here that previously took me dozens of tweaks to get something reasonable, instantly returned me a set of high quality images, at least one of which was much closer to my mind's eye than anything in those dozens of previous attempts.

( Prompting: An iron-gated door, set into a light stone arch, all deep set into the side of a gentle hill, as if the entrance to a forgotten crypt. The hill is lightly wooded, there is foliage in season. It is early evening. --v 6.1 )

And result: https://cdn.midjourney.com/5b56f713-3d64-471f-8c3c-08a0247e6...

The style matches exactly what I'd want too, it's captured "Fantasy RP book illustration" extremely well, despite that not being in the prompt!

Access blocked: Midjourney’s request does not comply with Google’s policies

Midjourney’s request does not comply with Google’s ‘Use secure browsers’ policy. If this app has a website, you can open a web browser and try signing in from there. If you are attempting to access a wireless network, please follow these instructions.

You can also contact the developer to let them know that their app must comply with Google’s ‘Use secure browsers’ policy.

Learn more about this error

If you are a developer of Midjourney, see error details.

Error 403: disallowed_useragent

It's just astonishing to me how difficult it still seems to be for the Midjourney team to develop a web site that amounts to little more than a textbox, a button, and an img tag.

I tried their new web experience, and... it's just broken. It doesn't work. There's a showcase of other people's work, and that's it. I can't click the text-box, it's greyed out. It says "subscribe to start creating", but there's is no "subscribe" button!

Mindblowing.

There's a theory that Midjourney didn't want to pay for image hosting (which they were getting for free from Discord).

In iOS Safari I get

    Unable to process request due to missing initial state. This may happen if browser sessionStorage is inaccessible or accidentally cleared. Some specific scenarios are - 1) Using IDP-Initiated SAML SSO. 2) Using signInWithRedirect in a storage-partitioned browser environment.

EDIT: I tried again from scratch in a new tab and this time it worked. So, temporary hiccup.

EDIT2: I have all the images I created on Discord in the web app - very nice!

Ever since DALL-E 3 completely eclipsed Midjourney in terms of Prompt ADHERENCE (albeit not quality), I've had very little reason to make use of it. However, in my testing of Flux Dev, I can gen images in roughly 15 seconds in Forge, throw those at a SDXL model such as Dreamshaper in the form of a controlnet and get the best of both worlds, high detail and adherent.

Dall-E 3 (intentionally) leans away from realism though but in doing so what it leans into is a very tacky and aesthetically naive although competently executed type of image. Gives every image the feeling that you're seeing a bootleg version of a genuine thing and therefore makes everything else it touches feel tacky.

Same feeling you get looking at the airbrushed art on a state fairground ride.

>Same feeling you get looking at the airbrushed art on a state fairground ride.

That's a great way to describe it. A lot of articles and youtube pics are using these images lately and they all give that sort of vibe.

In 15 seconds? Really? On my machine with good specs and a 4090 flux-dev takes around 400 seconds for 512x512! And flux-schnell at least half. Do you recommend a tutorial for optimization?

For a company workshop I wanted to pay for Midjourney to invite a bot into a private Discord with the workshop participants. We couldn't find a way of using it as a company account, and ultimately every participant had to get a sub, which was less than ideal. If it was today I would have them use Flux in some hosted version.

I only see a form field for /imagine. How can I /describe?

The docs say to hover over the uploaded image or use the test tube icon from the sidebar, neither of which seem to be available on mobile.

Yes, I'm finding the "meh" here is expandable to most of the companies, and that's a great thing (e.g just got a $500 card and OpenHermes feels comparable to anything and it's running fully locally) I know there's often not a lot to be optimistic about, but the fact that so called "AI" is absolutely de-facto free/open source is perhaps just about the best way this stuff could roll out.

Flux, latest Midjourney, or even dall-e... I'm still disappointed that 3 of my sci-fi prompts never works (interior of an O'Neill Cylinder, Space elevator, rotating space station). I hope someone makes lora of those one day.

I am also still struggling with this. Tried various prompts along the lines of "draw me the inside of a cylindrical space habitat rotating along it's axis to create artificial gravity", like a more detailed:

"Create a highly detailed image of the interior of a massive cylindrical space habitat, rotating along its axis to generate artificial gravity. The habitat's inner surface is divided into multiple sections featuring lush forests, serene lakes, and small, picturesque villages. The central axis of the cylinder emits a soft, ambient light, mimicking natural sunlight, casting gentle shadows and creating a sense of day and night. The habitat's curvature should be evident, with the landscape bending upwards on both sides. Include a sense of depth and scale, highlighting the vastness of this self-contained world."

flux kind of got the idea but the gravity is still off.

"Halo, Interior of a cylindrical space station habitat, upside down, fields, lakes, rivers, housing, rivers running around cylinder roof, Rick Guidice artwork, in space, stars visible through windows"

https://cdn.midjourney.com/ed234782-ac5c-41d7-99e9-20ce122cd...

They definitely struggle with getting the land moving up the wall and often it treats the cylinder like a window onto earth at ground level but I think with tweaking the weighting or order of terms and enough rolls of the dice you could get it.

I'm so glad that Midjourney and Flux is making creativity accessible to everyone, I don't need to have a subscription to MJ anymore now that Flux is getting better.

Everybody can now be an artist and have creativity for free now.

What a time to be alive.

Creativity has always been accessible to everyone. Creativity is a concept which only requires imagination. Those ideas can take many forms, including writing, cooking, sewing, doing skateboard tricks… It doesn’t mean rearranging pixels on screen by vaguely describing something.

（评论） (comments)

（评论）
(comments)