(评论)
(comments)

原始链接: https://news.ycombinator.com/item?id=44159166

Cloudflare借助Anthropic的Claude AI模型构建了一个OAuth库,并公开了使用的提示词。Cloudflare工程师Kenton Varda最初对此持怀疑态度,但Claude生成的代码质量让他感到惊讶。尽管代码经过安全专家的严格审查并与RFC进行了交叉比对,但这整个过程仍然突显了AI在编码方面的潜力。 虽然有些人质疑AI工具是否真的能够超越现有数据集进行创新,但另一些人认为AI可以显著加快开发速度,尤其是在与经验丰富的工程师配合使用时。Kenton Varda强调了人工监督和专业知识的重要性,尤其是在OAuth实现等关键系统中,但也指出了AI在安全、沙盒环境中赋能非专业人士的潜力。该项目成为当前AI模型在软件开发中能力和局限性的一个案例研究。


原文
Hacker News new | past | comments | ask | show | jobs | submit login
Cloudlflare builds OAuth with Claude and publishes all the prompts (github.com/cloudflare)
97 points by gregorywegory 1 hour ago | hide | past | favorite | 49 comments










From the readme: This library (including the schema documentation) was largely written with the help of Claude, the AI model by Anthropic. Claude's output was thoroughly reviewed by Cloudflare engineers with careful attention paid to security and compliance with standards. Many improvements were made on the initial output, mostly again by prompting Claude (and reviewing the results). Check out the commit history to see how Claude was prompted and what code it produced.

"NOOOOOOOO!!!! You can't just use an LLM to write an auth library!"

"haha gpus go brrr"

In all seriousness, two months ago (January 2025), I (@kentonv) would have agreed. I was an AI skeptic. I thoughts LLMs were glorified Markov chain generators that didn't actually understand code and couldn't produce anything novel. I started this project on a lark, fully expecting the AI to produce terrible code for me to laugh at. And then, uh... the code actually looked pretty good. Not perfect, but I just told the AI to fix things, and it did. I was shocked.

To emphasize, this is not "vibe coded". Every line was thoroughly reviewed and cross-referenced with relevant RFCs, by security experts with previous experience with those RFCs. I was trying to validate my skepticism. I ended up proving myself wrong.

Again, please check out the commit history -- especially early commits -- to understand how this went.



On the one hand, I would expect LLMs to be able to crank out such code when prompted by skilled engineers who also understand prompting these tools correctly. Neither OAuth isn’t new, has tons of working examples to steal as training data from public projects, and in a variety of existing languages to suit most use cases or needs.

On the other hand, where I remain a skeptic is this constant banging-on that somehow this will translate into entirely new things - research, materials science, economies, inventions, etc - because that requires learning “in real time” from information sources you’re literally generating in that moment, not decades of Stack Overflow responses without context. That has been bandied about for years, with no evidence to show for it beyond specifically cherry-picked examples, often from highly-controlled environments.

I never doubted that, with competent engineers, these tools could be used to generate “new” code from past datasets. What I continue to doubt is the utility of these tools given their immense costs, both environmentally and socially.



>In all seriousness, two months ago (January 2025), I (@kentonv) would have agreed.

I'm confused by "I (@kentonv)" means here because kentonv is a different user.[0] Are you saying this is your alt? Or is this a typo/misunderstanding?

Edit: Figured out that most of your post is quoting the README. Consider using > and * characters to clarify.

[0] https://news.ycombinator.com/user?id=kentonv



He is quoting from the project readme. I wrote all this text.


Thanks for weighing in here

If I might make a suggestion, based on how fast things change, even within a model family, you may benefit from saying Claude what. I was especially cognizant of this given the recent v4 release which (of course) hailed as the second coming. Regardless, you may want to update your readme to say

It may also be wildly out of scope for including in a project's readme, but knowing which of the bazillions of coding tools you used would also help a tiny bit with this reproduction crises found in every single one of these style threads



> It may also be wildly out of scope for including in a project's readme

The entire point of the repository seems to be to invalidate/validate the thesis if LLMs are good enough to be pair programmers right now. Removing it from the README makes no sense in that context.



I did consider that, but the repo isn't called "kentonv does a yolo" it's straight-up labeled as a provider library for CF workers under Cloudflare's brand

Some hair splitting about whether including the Claude stanza is "full disclosure," or "AI advocacy," or just because it's cool

Anyway, I mentioned the out of scope because if half the readme is about correct usage of the library, and half is about the sausage making, I'd be confused as a reader about whether this was designed to be for real or for funzies



This library is a core component of our MCP framework, it's not just an experiment.


I believe it's important to say when AI was used so heavily in building a library -- it would feel dishonest to me to claim I wrote it all myself. I also think it's just a pretty interesting thing to know about. So I think it belongs in the readme. (But I'm not making a moral judgment on what anyone else does.)

It was almost entirely Claude Sonnet 3.7. I agree I should add the version to the readme.



It's a literal copy-paste from the README, I think it was supposed to be quoted but parent messed it up somehow.

https://github.com/cloudflare/workers-oauth-provider/blob/fe...



Yup. I'm more skeptic than pro-AI these days, but nonetheless i'm still trying to use AI in my workflows.

I don't actually enjoy it, i generally find it difficult to use as i have more trouble explaining what i want than actually just doing it. However it seems clear that this is not going away and to some degree it's "the future". I suspect it's better to learn the new tools of my craft than to be caught unaware.

With that said i still think we're in the infancy of actual tooling around this stuff though. I'm always interested to see novel UXs on this front.



Probably unrelated to the broader discussion, but I don't think the "skeptic vs pro-AI" distinction even makes that much sense.

For example, I usually come off as being relatively skeptic within the HN crowd, but I'm actually pushing for more usage at work. This kind of "opinion arbitrage" is common with new technologies.



> but I don't think the "skeptic vs pro-AI" distinction even makes that much sense

Tends to be like that with subjects once feelings get involved. Make any skepticism public, even if you don't feel strongly either way, and you get one side of extremists yelling at you about X. At the same time, say anything positive and you get the zealots from the other side yelling at you about Y.

Us who tend to be not so extremist gets push back from both sides, either in the same conversations or in different places, while both see you as belonging to "the other side" while in reality you're just trying to take a somewhat balanced approach.

These "us vs them" never made sense to me, for (almost) any topic. Truth usually sits somewhere around the middle, and a balanced approach seems to usually result in more benefits overall, at least personally for me.



I guess for me the questions is, at what point do you feel it would be reasonable to this without the experts involved in your case?

As an edit, after reading some of the prompts, what is the likelihood that a non-expert could even come up with those prompts?

The really really interesting thing would be if an AI could actually generate the prompts.



(I'm the author of this library -- or, the guy who prompted the AI at least.)

I absolutely would not vibe code an OAuth implementation! Or any other production code at Cloudflare. We've been using more AI internally, but made this rule very clear: the human engineer directing the AI must fully understand and take responsibility for any code which the AI has written.

I do think vibe coding can be really useful in low-stakes environments, though. I vibe-coded an Android app to use as a baby monitor (it just streams audio from a Unifi camera in the kid's room). I had no previous Android experience, and it would have taken me weeks to learn without AI, but it only took a few hours with AI.

I think we are in desperate need of safe vibe coding environments where code runs in a sandbox with security policies that make it impossible to screw up. That would enable a whole lot of people to vibe-code personal apps for personal use cases. It happens I have some background building such platforms...

But those guardrails only really make sense at the application level. At the systems level, I don't think this is possible. AI is not smart enough yet to build systems without serious bugs and security issues. So human experts are still going to be necessary for a while there.



> I think we are in desperate need of safe vibe coding environments where code runs in a sandbox with security policies that make it impossible to screw up.

OpenAI's new Rust version of Codex might be of interest, haven't dived deeper into the codebase but seems they're thinking about sandboxing from the get-go: https://github.com/openai/codex/blob/7896b1089dbf702dd079299...



Why do you need a non-expert? We built on layers of abstractions, AI will help you at whichever layer you're the "expert" at. Of course you'll need to understand low-level stuff to work on low-level code

i.e. I might not use AI to build an OAuth library, but I might use AI to build a web app (which I am an expert at) that may use an OAuth library Cloudfare developed (which theya are experts at). Trying to make "anyone" code "anything" doesn't seem like the point to me



> I guess for me the questions is, at what point do you feel it would be reasonable to this without the experts involved in your case?

I don't know if it was the intent but these kind of questions bother me, the seem to hint at an agenda, "when can I have a farm of idiots with keyboards paid minimum wage churn out products indistinguishable from expertly designed applications".

To me that's the danger of AI, not it's purported intelligence, but our manifested greed.



GP is just quoting the readme, they aren't the author.

My 2 cents:

>I guess for me the questions is, at what point do you feel it would be reasonable to this without the experts involved in your case?

No sooner and no later than we could say the same thing about a junior developer. In essence, if you can't validate the code produced by a LLM then you shouldn't really have been writing that code to begin with.

>The really really interesting thing would be if an AI could actually generate the prompts.

I think you've hit on something that is going underexplored right now in my opinion. Orchestration of AI agents, where a we have a high level planning agent delegating subtasks to more specialized agents to perform them and report back. I think an approach like that could help avoid context saturation for longer tasks. Cline / Aider / Roo Code / etc do something like this with architect mode vs coding mode but I think it can be generalized.



same argument with me but only for claude

another models feels like shit to use, but claude is good



From this commit: https://github.com/cloudflare/workers-oauth-provider/commit/...

===

"Fix Claude's bug manually. Claude had a bug in the previous commit. I prompted it multiple times to fix the bug but it kept doing the wrong thing.

So this change is manually written by a human.

I also extended the README to discuss the OAuth 2.1 spec problem."

===

This is super relatable to my experience trying to use these AI tools. They can get halfway there and then struggle immensely.



> They can get halfway there and then struggle immensely.

Restart the conversation from scratch. As soon as you get something incorrect, begin from the beginning.

It seems to me like any mistake in a messages chain/conversation instantly poisons the output afterwards, even if you try to "correct" it.

So if something was wrong at one point, you need to go back to the initial message, and adjust it to clarify the prompt enough so it doesn't make that same mistake again, and regenerate the conversation from there on.



This to me is why I think these tools don't have actual understanding, and are instead producing emergent output from pooling an incomprehensibly large set of pattern-recognized data.


> these tools don't have actual understanding, and are instead producing emergent output from pooling an incomprehensibly large set of pattern-recognized data

I mean, bypassing the fact that "actual understanding" doesn't have any consensus about what it is, does it matter if it's "actual understanding" or "kind of understanding", or even "barely understanding", as long as it produces the results you expect?



Same. But I personally find it a lot easier to do those bits at the end than to begin from a blank file/function, so it's a good match for me.


This is exactly the direction I expect AI-assisted coding to go in. Not software engineers being kicked out and some business person pressing a few buttons to have a fully functional app (as is playing out in a lot of fantasies on LinkedIn & X), but rather experienced engineers using AI to generate bits of code and then meticulously reviewing and testing them.

The million dollar (perhaps literally) question is – could @kentonv have written this library quicker by himself without any AI help?



> The million dollar (perhaps literally) question is – could @kentonv have written this library quicker by himself without any AI help?

I *think* the answer to this is clearly no: or at least, given what we can accomplish today with the tools we have now, and that we are still collectively learning how to effectively use this, there's no way it won't be faster (with effective use) in another 3-6 months to fully-code new solutions with AI. I think it requires a lot of work: well-documented, well-structured codebases with fast built-in feedback loops (good linting/unit tests etc.), but we're heading there no



> Not software engineers being kicked out ... but rather experienced engineers using AI to generate bits of code and then meticulously reviewing and testing them.

But what if you only need 2 kentonv's instead of 20 at the end? Do you assume we'll find enough new tasks that will occupy the other 18? I think that's the question.

And the author is implementing a fairly technical project in this case. How about routine LoB app development?



Increased productivity means increased opportuntity. There isn't going to be a time (at least not anytime soon) when we can all sit back and say "yup, we have accomplished everything there is to do with software and don't need more engineers".


> but rather experienced engineers using AI to generate bits of code and then meticulously testing and reviewing them.

My problem is that (in my experience anyways) this is slower than me just writing the code myself. That's why AI is not a useful tool right now. They only get it right sometimes so it winds up being easier to just do it yourself in the first place. As the saying goes: bad help is worse than no help at all, and AI is bad help right now.



The million-dollar question is not whether you can review at the speed the model is coding. It is whether you can trust review alone to catch everything.

If a robot assembles cars at lightning speed... but occasionally misaligns a bolt, and your only safeguard is a visual inspection afterward, some defects will roll off the assembly line. Human coders prevent many bugs by thinking during assembly.



I’ve tried building a web app with LLMs before. Two of them went in circles—I'd ask them to fix an infinite loop, they’d remove the code for a feature; I’d ask them to add the feature back, they’d bring back the infinite loop, and so on. The third one kept losing context—after just 2–3 messages, it would rebuild the whole thing differently.

They’ll probably get better, but for now I can safely say I’ve spent more time building and tweaking prompts than getting helpful results.



Congrats and thanks for sharing, both the code and the story.

Which Claude plan did you use? Was it enough or did you feel limited by the quotas?



This was mostly Claude Code, which runs on API credits. I think I spent a two-digit number of dollars. The model was Sonnet 3.7 (this was all a couple months ago, before Claude 4).


I’ve been using Claude (via Cursor) on a greenfield project for the last couple months and my observation is:

1. I am much more productive/effective

2. It’s way more cognitively demanding than writing code the old-fashioned way

3. Even over this short timespan, the tools have improved significantly, amplifying both of the points above



> It’s way more cognitively demanding than writing code the old-fashioned way

How are you using it?

I've been mainly doing "pair programming" with my own agent (using Devstral as of late) and find the reviewing much easier than it would been to literally type all of the code it produces, at least time wise.

I've also tried vibe coding for a bit, and for that I'd agree with you, as you don't have any context if you end up wanting to review something. Basically, if the project was vibe coded from the beginning, it's much harder to get into the codebase.

But when pair programming with the LLM, I already have a built up context, and understand how I want things to be and so on, so reviewing pair programmed code goes a lot faster than reviewing vibe coded code.



I’ve tried a bunch of things but now I’m mostly using Cursor in agent mode with Claude Sonnet 4, doing small-ish pull-request-sized prompts. I don’t have to review code as carefully as I did with Claude 3.7

but I’m finding the bottleneck now is architecture design. I end up having these long discussions with chatGPT-o3 about design patterns, sometimes days of thinking, and then relatively quick implementation sessions with Cursor



Getting a "Too Many Requests" error is kind of hilarious given the company involved.


Oh hey, looks like it's mostly Kenton Varda, who you may recognize from his LAN party house: https://news.ycombinator.com/item?id=42156977


I think this is pretty cool, but it doesn't really move my priors that much. Looking at the commit history shows a lot of handholding even in pretty basic situations, but on the other hand they probably saved a lot of time vs. doing everything manually.


Looking at the commit history, there’s a fair bit of manual intervention to fix bugs and remove unused code.


mods: typo in title "CloudLflare"


There is no "@" system here, you are welcome to email [email protected] or hope that we're still within the edit window for the title


fuck cloudflare, the entire internet has captchas now


Just put a form on a website and you will see why... CloudFlare provides the solution, not causing the problem.


> CloudFlare provides the solution, not causing the problem

Isn't CloudFlare infamously know for refusing to take down websites of people who are causing the problems (e.g. DDoS services that make use of CloudFlare's services)?



> Just put a form on a website and you will see why...

I host my local community graveyard website and I've had no issue with forms. These forms are for tour bookings and contact.

And yes they are causing the problems. They restrict me because I use my own self-hosted colocated VPN in the same country on a proper dedicated IP with rDNS that too is CloudFlares doing.



Until you make your own website, then you love them


I've made plenty of websites, still don't love them and still get served 5+ captchas sometimes, straight after each other. Perhaps I have to give them money, then I'll love them?






Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact



Search:
联系我们 contact @ memedata.com