Cloudlflare builds OAuth with Claude and publishes all the prompts

gregorywegory · 2025-06-02T14:24:54 1748874294

From the readme: This library (including the schema documentation) was largely written with the help of Claude, the AI model by Anthropic. Claude's output was thoroughly reviewed by Cloudflare engineers with careful attention paid to security and compliance with standards. Many improvements were made on the initial output, mostly again by prompting Claude (and reviewing the results). Check out the commit history to see how Claude was prompted and what code it produced.

"NOOOOOOOO!!!! You can't just use an LLM to write an auth library!"

"haha gpus go brrr"

In all seriousness, two months ago (January 2025), I (@kentonv) would have agreed. I was an AI skeptic. I thoughts LLMs were glorified Markov chain generators that didn't actually understand code and couldn't produce anything novel. I started this project on a lark, fully expecting the AI to produce terrible code for me to laugh at. And then, uh... the code actually looked pretty good. Not perfect, but I just told the AI to fix things, and it did. I was shocked.

To emphasize, this is not "vibe coded". Every line was thoroughly reviewed and cross-referenced with relevant RFCs, by security experts with previous experience with those RFCs. I was trying to validate my skepticism. I ended up proving myself wrong.

Again, please check out the commit history -- especially early commits -- to understand how this went.

stego-tech · 2025-06-02T15:27:21 1748878041

On the one hand, I would expect LLMs to be able to crank out such code when prompted by skilled engineers who also understand prompting these tools correctly. Neither OAuth isn’t new, has tons of working examples to steal as training data from public projects, and in a variety of existing languages to suit most use cases or needs.

On the other hand, where I remain a skeptic is this constant banging-on that somehow this will translate into entirely new things - research, materials science, economies, inventions, etc - because that requires learning “in real time” from information sources you’re literally generating in that moment, not decades of Stack Overflow responses without context. That has been bandied about for years, with no evidence to show for it beyond specifically cherry-picked examples, often from highly-controlled environments.

I never doubted that, with competent engineers, these tools could be used to generate “new” code from past datasets. What I continue to doubt is the utility of these tools given their immense costs, both environmentally and socially.

mtlynch · 2025-06-02T15:07:09 1748876829

>In all seriousness, two months ago (January 2025), I (@kentonv) would have agreed.

I'm confused by "I (@kentonv)" means here because kentonv is a different user.[0] Are you saying this is your alt? Or is this a typo/misunderstanding?

Edit: Figured out that most of your post is quoting the README. Consider using > and * characters to clarify.

[0] https://news.ycombinator.com/user?id=kentonv

kentonv · 2025-06-02T15:09:01 1748876941

He is quoting from the project readme. I wrote all this text.

mdaniel · 2025-06-02T15:11:21 1748877081

Thanks for weighing in here

If I might make a suggestion, based on how fast things change, even within a model family, you may benefit from saying Claude what. I was especially cognizant of this given the recent v4 release which (of course) hailed as the second coming. Regardless, you may want to update your readme to say

It may also be wildly out of scope for including in a project's readme, but knowing which of the bazillions of coding tools you used would also help a tiny bit with this reproduction crises found in every single one of these style threads

diggan · 2025-06-02T15:13:49 1748877229

> It may also be wildly out of scope for including in a project's readme

The entire point of the repository seems to be to invalidate/validate the thesis if LLMs are good enough to be pair programmers right now. Removing it from the README makes no sense in that context.

mdaniel · 2025-06-02T15:18:30 1748877510

I did consider that, but the repo isn't called "kentonv does a yolo" it's straight-up labeled as a provider library for CF workers under Cloudflare's brand

Some hair splitting about whether including the Claude stanza is "full disclosure," or "AI advocacy," or just because it's cool

Anyway, I mentioned the out of scope because if half the readme is about correct usage of the library, and half is about the sausage making, I'd be confused as a reader about whether this was designed to be for real or for funzies

kentonv · 2025-06-02T15:22:51 1748877771

This library is a core component of our MCP framework, it's not just an experiment.

kentonv · 2025-06-02T15:21:41 1748877701

I believe it's important to say when AI was used so heavily in building a library -- it would feel dishonest to me to claim I wrote it all myself. I also think it's just a pretty interesting thing to know about. So I think it belongs in the readme. (But I'm not making a moral judgment on what anyone else does.)

It was almost entirely Claude Sonnet 3.7. I agree I should add the version to the readme.

diggan · 2025-06-02T15:09:21 1748876961

It's a literal copy-paste from the README, I think it was supposed to be quoted but parent messed it up somehow.

https://github.com/cloudflare/workers-oauth-provider/blob/fe...

unshavedyak · 2025-06-02T14:59:07 1748876347

Yup. I'm more skeptic than pro-AI these days, but nonetheless i'm still trying to use AI in my workflows.

I don't actually enjoy it, i generally find it difficult to use as i have more trouble explaining what i want than actually just doing it. However it seems clear that this is not going away and to some degree it's "the future". I suspect it's better to learn the new tools of my craft than to be caught unaware.

With that said i still think we're in the infancy of actual tooling around this stuff though. I'm always interested to see novel UXs on this front.

qsort · 2025-06-02T15:04:30 1748876670

Probably unrelated to the broader discussion, but I don't think the "skeptic vs pro-AI" distinction even makes that much sense.

For example, I usually come off as being relatively skeptic within the HN crowd, but I'm actually pushing for more usage at work. This kind of "opinion arbitrage" is common with new technologies.

diggan · 2025-06-02T15:08:32 1748876912

> but I don't think the "skeptic vs pro-AI" distinction even makes that much sense

Tends to be like that with subjects once feelings get involved. Make any skepticism public, even if you don't feel strongly either way, and you get one side of extremists yelling at you about X. At the same time, say anything positive and you get the zealots from the other side yelling at you about Y.

Us who tend to be not so extremist gets push back from both sides, either in the same conversations or in different places, while both see you as belonging to "the other side" while in reality you're just trying to take a somewhat balanced approach.

These "us vs them" never made sense to me, for (almost) any topic. Truth usually sits somewhere around the middle, and a balanced approach seems to usually result in more benefits overall, at least personally for me.

hattmall · 2025-06-02T15:04:19 1748876659

I guess for me the questions is, at what point do you feel it would be reasonable to this without the experts involved in your case?

As an edit, after reading some of the prompts, what is the likelihood that a non-expert could even come up with those prompts?

The really really interesting thing would be if an AI could actually generate the prompts.

kentonv · 2025-06-02T15:16:32 1748877392

(I'm the author of this library -- or, the guy who prompted the AI at least.)

I absolutely would not vibe code an OAuth implementation! Or any other production code at Cloudflare. We've been using more AI internally, but made this rule very clear: the human engineer directing the AI must fully understand and take responsibility for any code which the AI has written.

I do think vibe coding can be really useful in low-stakes environments, though. I vibe-coded an Android app to use as a baby monitor (it just streams audio from a Unifi camera in the kid's room). I had no previous Android experience, and it would have taken me weeks to learn without AI, but it only took a few hours with AI.

I think we are in desperate need of safe vibe coding environments where code runs in a sandbox with security policies that make it impossible to screw up. That would enable a whole lot of people to vibe-code personal apps for personal use cases. It happens I have some background building such platforms...

But those guardrails only really make sense at the application level. At the systems level, I don't think this is possible. AI is not smart enough yet to build systems without serious bugs and security issues. So human experts are still going to be necessary for a while there.

diggan · 2025-06-02T15:18:39 1748877519

> I think we are in desperate need of safe vibe coding environments where code runs in a sandbox with security policies that make it impossible to screw up.

OpenAI's new Rust version of Codex might be of interest, haven't dived deeper into the codebase but seems they're thinking about sandboxing from the get-go: https://github.com/openai/codex/blob/7896b1089dbf702dd079299...

dkdcio · 2025-06-02T15:10:41 1748877041

Why do you need a non-expert? We built on layers of abstractions, AI will help you at whichever layer you're the "expert" at. Of course you'll need to understand low-level stuff to work on low-level code

i.e. I might not use AI to build an OAuth library, but I might use AI to build a web app (which I am an expert at) that may use an OAuth library Cloudfare developed (which theya are experts at). Trying to make "anyone" code "anything" doesn't seem like the point to me

rangerelf · 2025-06-02T15:18:25 1748877505

> I guess for me the questions is, at what point do you feel it would be reasonable to this without the experts involved in your case?

I don't know if it was the intent but these kind of questions bother me, the seem to hint at an agenda, "when can I have a farm of idiots with keyboards paid minimum wage churn out products indistinguishable from expertly designed applications".

To me that's the danger of AI, not it's purported intelligence, but our manifested greed.

nisegami · 2025-06-02T15:14:10 1748877250

GP is just quoting the readme, they aren't the author.

My 2 cents:

>I guess for me the questions is, at what point do you feel it would be reasonable to this without the experts involved in your case?

No sooner and no later than we could say the same thing about a junior developer. In essence, if you can't validate the code produced by a LLM then you shouldn't really have been writing that code to begin with.

>The really really interesting thing would be if an AI could actually generate the prompts.

I think you've hit on something that is going underexplored right now in my opinion. Orchestration of AI agents, where a we have a high level planning agent delegating subtasks to more specialized agents to perform them and report back. I think an approach like that could help avoid context saturation for longer tasks. Cline / Aider / Roo Code / etc do something like this with architect mode vs coding mode but I think it can be generalized.

tonyhart7 · 2025-06-02T14:58:49 1748876329

same argument with me but only for claude

another models feels like shit to use, but claude is good

infinitebattery · 2025-06-02T15:11:10 1748877070

From this commit: https://github.com/cloudflare/workers-oauth-provider/commit/...

===

"Fix Claude's bug manually. Claude had a bug in the previous commit. I prompted it multiple times to fix the bug but it kept doing the wrong thing.

So this change is manually written by a human.

I also extended the README to discuss the OAuth 2.1 spec problem."

===

This is super relatable to my experience trying to use these AI tools. They can get halfway there and then struggle immensely.

diggan · 2025-06-02T15:20:31 1748877631

> They can get halfway there and then struggle immensely.

Restart the conversation from scratch. As soon as you get something incorrect, begin from the beginning.

It seems to me like any mistake in a messages chain/conversation instantly poisons the output afterwards, even if you try to "correct" it.

So if something was wrong at one point, you need to go back to the initial message, and adjust it to clarify the prompt enough so it doesn't make that same mistake again, and regenerate the conversation from there on.

mysterydip · 2025-06-02T15:24:15 1748877855

This to me is why I think these tools don't have actual understanding, and are instead producing emergent output from pooling an incomprehensibly large set of pattern-recognized data.

diggan · 2025-06-02T15:26:43 1748878003

> these tools don't have actual understanding, and are instead producing emergent output from pooling an incomprehensibly large set of pattern-recognized data

I mean, bypassing the fact that "actual understanding" doesn't have any consensus about what it is, does it matter if it's "actual understanding" or "kind of understanding", or even "barely understanding", as long as it produces the results you expect?

nisegami · 2025-06-02T15:15:46 1748877346

Same. But I personally find it a lot easier to do those bits at the end than to begin from a blank file/function, so it's a good match for me.

paxys · 2025-06-02T15:04:53 1748876693

This is exactly the direction I expect AI-assisted coding to go in. Not software engineers being kicked out and some business person pressing a few buttons to have a fully functional app (as is playing out in a lot of fantasies on LinkedIn & X), but rather experienced engineers using AI to generate bits of code and then meticulously reviewing and testing them.

The million dollar (perhaps literally) question is – could @kentonv have written this library quicker by himself without any AI help?

dkdcio · 2025-06-02T15:08:51 1748876931

> The million dollar (perhaps literally) question is – could @kentonv have written this library quicker by himself without any AI help?

I *think* the answer to this is clearly no: or at least, given what we can accomplish today with the tools we have now, and that we are still collectively learning how to effectively use this, there's no way it won't be faster (with effective use) in another 3-6 months to fully-code new solutions with AI. I think it requires a lot of work: well-documented, well-structured codebases with fast built-in feedback loops (good linting/unit tests etc.), but we're heading there no

gokhan · 2025-06-02T15:23:43 1748877823

> Not software engineers being kicked out ... but rather experienced engineers using AI to generate bits of code and then meticulously reviewing and testing them.

But what if you only need 2 kentonv's instead of 20 at the end? Do you assume we'll find enough new tasks that will occupy the other 18? I think that's the question.

And the author is implementing a fairly technical project in this case. How about routine LoB app development?

paxys · 2025-06-02T15:27:13 1748878033

Increased productivity means increased opportuntity. There isn't going to be a time (at least not anytime soon) when we can all sit back and say "yup, we have accomplished everything there is to do with software and don't need more engineers".

bigstrat2003 · 2025-06-02T15:21:24 1748877684

> but rather experienced engineers using AI to generate bits of code and then meticulously testing and reviewing them.

My problem is that (in my experience anyways) this is slower than me just writing the code myself. That's why AI is not a useful tool right now. They only get it right sometimes so it winds up being easier to just do it yourself in the first place. As the saying goes: bad help is worse than no help at all, and AI is bad help right now.

belter · 2025-06-02T15:22:50 1748877770

The million-dollar question is not whether you can review at the speed the model is coding. It is whether you can trust review alone to catch everything.

If a robot assembles cars at lightning speed... but occasionally misaligns a bolt, and your only safeguard is a visual inspection afterward, some defects will roll off the assembly line. Human coders prevent many bugs by thinking during assembly.

_tqr3 · 2025-06-02T15:25:15 1748877915

I’ve tried building a web app with LLMs before. Two of them went in circles—I'd ask them to fix an infinite loop, they’d remove the code for a feature; I’d ask them to add the feature back, they’d bring back the infinite loop, and so on. The third one kept losing context—after just 2–3 messages, it would rebuild the whole thing differently.

They’ll probably get better, but for now I can safely say I’ve spent more time building and tweaking prompts than getting helpful results.

thih9 · 2025-06-02T15:12:37 1748877157

Congrats and thanks for sharing, both the code and the story.

Which Claude plan did you use? Was it enough or did you feel limited by the quotas?

kentonv · 2025-06-02T15:25:15 1748877915

This was mostly Claude Code, which runs on API credits. I think I spent a two-digit number of dollars. The model was Sonnet 3.7 (this was all a couple months ago, before Claude 4).

jes5199 · 2025-06-02T15:09:46 1748876986

I’ve been using Claude (via Cursor) on a greenfield project for the last couple months and my observation is:

1. I am much more productive/effective

2. It’s way more cognitively demanding than writing code the old-fashioned way

3. Even over this short timespan, the tools have improved significantly, amplifying both of the points above

diggan · 2025-06-02T15:16:42 1748877402

> It’s way more cognitively demanding than writing code the old-fashioned way

How are you using it?

I've been mainly doing "pair programming" with my own agent (using Devstral as of late) and find the reviewing much easier than it would been to literally type all of the code it produces, at least time wise.

I've also tried vibe coding for a bit, and for that I'd agree with you, as you don't have any context if you end up wanting to review something. Basically, if the project was vibe coded from the beginning, it's much harder to get into the codebase.

But when pair programming with the LLM, I already have a built up context, and understand how I want things to be and so on, so reviewing pair programmed code goes a lot faster than reviewing vibe coded code.

jes5199 · 2025-06-02T15:28:28 1748878108

I’ve tried a bunch of things but now I’m mostly using Cursor in agent mode with Claude Sonnet 4, doing small-ish pull-request-sized prompts. I don’t have to review code as carefully as I did with Claude 3.7

but I’m finding the bottleneck now is architecture design. I end up having these long discussions with chatGPT-o3 about design patterns, sometimes days of thinking, and then relatively quick implementation sessions with Cursor

declan_roberts · 2025-06-02T15:24:41 1748877881

Getting a "Too Many Requests" error is kind of hilarious given the company involved.

abroadwin · 2025-06-02T15:01:04 1748876464

Oh hey, looks like it's mostly Kenton Varda, who you may recognize from his LAN party house: https://news.ycombinator.com/item?id=42156977

qsort · 2025-06-02T15:07:20 1748876840

I think this is pretty cool, but it doesn't really move my priors that much. Looking at the commit history shows a lot of handholding even in pretty basic situations, but on the other hand they probably saved a lot of time vs. doing everything manually.

skybrian · 2025-06-02T15:05:38 1748876738

Looking at the commit history, there’s a fair bit of manual intervention to fix bugs and remove unused code.

chrisweekly · 2025-06-02T14:58:23 1748876303

mods: typo in title "CloudLflare"

mdaniel · 2025-06-02T15:14:03 1748877243

There is no "@" system here, you are welcome to email [email protected] or hope that we're still within the edit window for the title

lichenwarp · 2025-06-02T15:04:23 1748876663

fuck cloudflare, the entire internet has captchas now

kissgyorgy · 2025-06-02T15:16:37 1748877397

Just put a form on a website and you will see why... CloudFlare provides the solution, not causing the problem.

throwaway84496 · 2025-06-02T15:25:38 1748877938

> CloudFlare provides the solution, not causing the problem

Isn't CloudFlare infamously know for refusing to take down websites of people who are causing the problems (e.g. DDoS services that make use of CloudFlare's services)?

doublerabbit · 2025-06-02T15:20:16 1748877616

> Just put a form on a website and you will see why...

I host my local community graveyard website and I've had no issue with forms. These forms are for tour bookings and contact.

And yes they are causing the problems. They restrict me because I use my own self-hosted colocated VPN in the same country on a proper dedicated IP with rDNS that too is CloudFlares doing.

drexlspivey · 2025-06-02T15:06:54 1748876814

Until you make your own website, then you love them

diggan · 2025-06-02T15:12:21 1748877141

I've made plenty of websites, still don't love them and still get served 5+ captchas sometimes, straight after each other. Perhaps I have to give them money, then I'll love them?

（评论） (comments)

（评论）
(comments)