（评论）

（评论）
(comments)

原始链接: https://news.ycombinator.com/item?id=40818809

本文描述了用户在其职业生涯中处理各种软件开发项目的经历。它们强调了团队成员之间有效协作的实例，特别是那些自我意识极低、技能出色的工程师。这些团队生成了高质量的代码库，可供加入团队的初级人员轻松导航。他们将此与处理大型 Python 项目时的个人挫败感进行了对比，该项目的特点是编码混乱且缺乏适当的结构或文档。然而，他们强调成功的关键不一定在于所使用的工具或语言，而在于团队所采用的原则和思维方式。他们回忆起在 Python、C++、Java 和 Go 大型服务方面的工作经验，拥有十多年的经验。他们提到，虽然 C++ 允许创建难以理解的代码，但需要专业知识才能编写代码并被其他人理解。另一方面，当团队不强制执行更严格的类型时，Python 可能会导致初级人员在代码库中引入混乱。他们还指出，与 Python 相比，他们发现带有类型检查的 JavaScript 出人意料地更加高效和简单。最后，他们讲述了对 Cocotron 项目的惊讶，这是一项单人成就，展示了专注的个人在跨平台开发中取得令人印象深刻的成果的力量。尽管离开了这个职位，他们仍然重视从这次经历中吸取的教训。

The one in my previous job, which was an admin board for a market intelligence application. Ultimately, the reason it was good was because the engineers had zero ego on top of having excellent skills. The team that set the codebase were basically 4 seniors and 3 principals (the client actually did pay for top talent in this case) so not only everything was based on industry standards, written elegantly and organized perfectly, but every time some new requirement came up, these senior / principal engineers would discuss it in the most civilized matter I have ever seen.

E.g, "we need to come up with a way to implement X". Person A gives their idea, person B gives another idea and so on until everybody shared their thoughts. Then someone would say "I think what person C said makes the most sense" and everybody would agree and that was it. 30 minutes to hear everybody out, 3 minutes to discuss who will do it and when and the meeting was over.

I think the biggest testament to this code base was that when junior members joined the team, they were able to follow the existing code for adding new features. It was that easy to navigate and understand the big picture of.

This is the approach I tried to take as an IC frontend on a building management system dashboard about a year ago. It was basically me and one other frontend, one full-stack, and a QA or 2, plus a manager. My manager I guess had written a lot of the code we'd be interacting with, and I guess was somewhat protective and reluctant to delegate authority over decisions around it. It was a big refactoring project, and I just encouraged my colleagues to take the things they wanted, like rebuilding the state management system. We'd discuss the requirements and why it was necessary, and I'd look for reasons to agree rather than disagree, then we'd be off. Something burnout has taught me is that the marginal differences between one implementation detail or another are not worth getting hung up on unless they pose a real problem, especially when someone else gets to decide that doing it fast is priority (sprints).

> Something burnout has taught me is that the marginal differences between one implementation detail or another are not worth getting hung up on unless they pose a real problem, especially when someone else gets to decide that doing it fast is priority (sprints).

I think this is true. And a self-imposed problem (should a problem arise) is much less frustrating to fix than one that came from a decision imposed by someone else, even if the latter avoided loads of other problems. Sometimes it's better to let people make mistakes (as you believe them to be) and correct them later.

AOL's server code was excellent. It was written by some people that had done similar things for BITNET and had a deep understanding of Unix programming and how to use event loops for scalability.

It was in C, but the process was expected to run for months without crashing or running out of memory; if it had an abnormal exit, you'd (the owner as set in some code thing) get an email with the core backtrace. If it had a memory leak, ops would be on you quickly. Ops was considered a primary driver of requirements to dev, and absolutely everything could be reloaded into a running server without restarting it. There was a TCP control port where a TCL interpreter inside the service was exposed and generally you wrote the (very simple, CS 101 style) TCL commands to manage the server. It was a "No Threads Kernel", scaled to the dozens or hundreds of physical machines communicating over a very well managed network, and most 1 process per core, and 1 core for the OS. The 200 or so unix developers (as we were called) had a common understanding of how the framework worked and if you were just writing app code it was basically impossible to write slow services. We had technical writers that would interview the developers and write books that could be handed to outside developers and lead to a successful integration with no developer time spent.

The NTK was primarily for sending msgs over the network - we had a principle to never write to disk (which were pretty slow in the 1990s), so everything was just a server to get network messages and then send out other messages in reponse and then assemble the replies/timeouts and send back a msg to the caller. All done over persistent connections established by the infrastructure, the applications just registered callbacks for msg type 'X' which would present one with the caller information, the msg as a buffer, and potentially a "user word" which would be a way to keep server state around between calls.

The layering, from the main select loop thru different layers of TCP connection, SSL handling (if used, not 100% was SSL in the 90s), thru persistent link handling, application msg handling, timers, memory allocation, etc. was done with such art that I felt it to be a thing of beauty.

Started with Tandem and Stratus pre Unix then migrated to HPUX then Solaris then Linux on stock hardware. Also Digital for big machines and early usage of 64 bit. I ported to Mac OSX but only for local dev.

Funny thing about HPUX to Solaris migration is we found a ton of null pointer dereferences which didn't cause SEGV on HPUX but did on Solaris. And Linux migration was to a different endianness so had to fix any copying to/from ints to a buffer.

I wouldn't say it was a failure. They had a good run in a very quickly evolving time. That they got a slice of business for years during that period isn't bad!

When did they fail? They acquired Time Warner, creating a company that after continued consolidation is now ranked #13 on the Fortune 500 list.

NetBSD.

The docco, the culture, the clarity and simplicity of design from first principles. The coherency of code across kernel, user-space, accompagnying material. The vibe.

You may know NetBSD (if, at all) as the BSD that wants to be "portable" and may dismiss it against FreeBSD's focus-claim of "performance" and OpenBSD's focus-claim of "security". Interestingly, trying to remain wildly portable requires a cleanliness inside that is very soothing.

In my case, it was the integration testing framework built for a large Python service.

This was ~10y ago, so my memory might not serve me well. A bit of context:

- proprietary service, written in Python, maaany KLOC,

- hundreds of engineers worked on it,

- before this framework, writing the integration tests was difficult -- you had a base framework, but the tests had no structure, everyone rolled out their own complicated way of wiring things -- very convoluted and flaky.

The new integration tests framework was build by a recently joined senior engineer. TBF, it's wrong to say that it's was a framework, if you think in the xUnit sense. This guy built a set of business components that you could connect & combine in a sound way to build your integration test. Doesn't sound like much, but it significantly simplified writing integration tests (it still had rough edges, but it was 10x improvement). It's rare to see the chaos being tamed in such elegant way.

What this guy did:

- built on top of the existing integration tests framework (didn't rollout something from zero),

- defined a clear semantic for the test components,

- built the initial set of the test components,

- held a strong ownership over the code -- through the code review he ensured that the new components follow semantics, and that each test component is covered by its own test (yep, tests for the test doubles, you don't see that very often).

Did it work well longterm? Unfortunately, no. He stayed relatively short (<2y). His framework deteriorated under the new ownership.

Travis, if you are reading this and you recognized yourself, thank you for your work!

Without knowing anything, and before reading this comment, I had the feeling of nr. 2. Strong feeling. I think the reason is „many klocs of Python“. I have developed alergy to big Python programs. I like python for small things, probably wrapping some C code.

It's about the different ways the language allows you to shoot yourself in the foot.

I worked on large Python, C++, Java & Go services. I have 10y+ of experience with the first 3. C++ allows you to write incomprehensible code (even to the experienced C++ devs) and justify its existence (because of the performance gains). But you need to be a top expert to write a compileable code of that type. I'm comfortable with diving in any C++ codebase except for the libraries like std, boost, abseil, folly, etc. Most of the code there is absurdly difficult to comprehend.

On the other hand Python leads in the ways a junior dev can introduce hell in the code. Especially if the team doesn't rely on the strict type check. I have seen horrors.

I was bewildered when I realized that working with JavaScript with type checks (Closure compiler) was insanely more productive and smooth than working with Python (before the type checks).

That's why Java won the enterprise world. It takes an effort to make a mess in Java (but people still manage). Go is in a similar place.

Because you're doing more work per line.

A 20 KLOC Python program is not more complicated than the 100 KLOC C program performing the same function. Quite the contrary. But if you're comparing 100 KLOC of C to 100 KLOC of Python, the Python program may seem unwieldy.

Not at all.

But when you come across 500 KLOC of C, then you say "wow this is big" and you're forewarned that this is going to be unwieldy. You may underestimate the 100 KLOC of Python. That's all I'm saying.

Why only those two options? It could be just "the company had other priorities and developers, while they appreciated Travis's work, didn't find it worthwhile to carry the torch themselves". Having a great testing framework is just one of the things that need devs' attention.

Fair point, it's a false dichotomy; though your example is close to "the org wasn't ready for it," which is itself vague.

Maybe the issue is that it easily comes apart without a dedicated censor, so to speak, and nobody wanted to have that role.

> Did it work well longterm? Unfortunately, no. He stayed relatively short (

I think the issue is that integ tests are not really a place that sees "development". You write the integ tests, and move on, you are not actively introducing more integ tests.

I think that's a shame that this is the common view of the people that need to fund this. Bad integ tests mean bad dev exp, which then results in increased attrition and dissatisfaction.

For me, the most eye-opening codebase of my career was Cocotron, around 2007:

https://github.com/cjwl/cocotron

I was looking for a way to port my native Mac Cocoa apps to Windows. I had been already disappointed by the aimless sprawl of GNUstep.

This one-person project implemented all the essential APIs for both Foundation and AppKit. Reading the code was a revelation: can it really be this simple and at the same time this effortlessly modular for cross-platform support?

I contributed a few missing classes, and successfully used Cocotron for some complex custom GUI apps that needed the dual-platform support.

Cocotron showed me that one person with a vision can build something that will rival or even outgun large teams at the big tech companies. But focus is essential. Architecture astronauts usually never get down from their high orbits to ship something.

+1 to this.

I remember my boss asking me to build a MacOS UI we had on Windows with this. I was very sceptical that it would go anywhere.

But no, it worked. I was just open mouth shocked to see the GUI worked pretty flawlessly in Windows.

Worked on a codebase for a large safety-critical system where everything was 100% documented, and the development guide for the project was followed so closely that you couldn't tell, across millions of lines of code, that the whole thing wasn't written by one person. Absolutely impressive levels of attention to detail everywhere, down to not even being able to find typographical errors in comments or documentation (a typo in a comment was treated just as seriously as any other bug).

Let me guess, it was very well funded and there were no fake deadlines and cross-team dependencies, am I correct or am I very correct?

Having worked in a couple of safety critical companies, there are things you definitely HAVE to do, but some companies do it better than others.

Some companies have process and development practices down pat: things go smoothly, and meeting the qualification process objectives is easy, because the work has been done right the whole way.

Other companies have less established or less consistent process. They generally meet the process objectives and deliver a working product, but the development process is often more of a struggle, and there is often a lot of "cleaning up" of missed pieces at the end of the release process.

This is just to say: companies and products in the safety critical space don't necessarily have some intrinsic quality, just a higher minimum bar.

In my experience there is exactly 0 correlation between certification needed and code Quality. Right now working in a multi billion company doing SW that must pass many certifications. The code is absolut trash. The tooling is terrible. The people confuse make and cmake. All said. And the SW gets certified, because it is all a matter of how much it costs to certify. It is a kind of high level corruption, that is not seen as corruption.

Having worked in both kinds personally, it's different.

You spend more time on less topics: the variety of work is much lower, which some people love, and some people are frustrated by. There is often much less "drastic innovation", or when there is, it takes much longer than you might be used to in other industries. Favor is given to proven, predictable technologies and choices, even if that leaves some opportunities on the table.

That being said, it's something I often miss. The ability to have such solid confidence on the things you've built, and the ability to drop into nearly any piece of the system and have everything be consistent and predictable is a quality in itself. It makes debugging (often VERY rare) issues much more tractable, both at a higher systems level, as well as digging deeper into the code itself.

> Favor is given to proven, predictable technologies and choices

I feel like this goes two ways too. Sometimes people favor technologies they’ve used before, regardless of how many problems they know it causes.

If it’s predictable that certain tech will cause you issues years down the line, do not choose it again.

Yep. Sometimes it's "known issues are avoidable issues", and other times it's "hey I wish we didn't spend xx% of our time avoiding (incl. spending time auditing syntax) or doing mitigations for the same limitations/issues over and over and over again".

The wheels do turn, just slowly.

Last year I worked for a client that gave me a lot of time, money and autonomy to lead dev on a critical software rewrite.

We got a small team of competent people, with domain experts to peer code with the devs.

It was wonderful. We could test, document and clean up. Having people who knew the trade and users at hand removed second guessing.

The result was so good we found bugs even in competitors' implementations.

We also got x5 in perfs compared to the system it was replacing and more features.

Similar thing.

Had time and autonomy from a client, so took sweet time examining the domain, the existing systems et al. Spent a few months writing the basis and the framework around what will be done, based on years and years of experience I had with bad frameworks and codebases, combined with working on the same domain for their parent company years ago.

And it worked. We delivered features insanely fast, hundreds of forms were migrated, feature generators would create 90% of the boilerplate code and the code was small, readable and neatly clustered. Maintaining it was a piece of cake, leading to us not having enough work after a while so I negotiated our time to half a week for the same money.

After a while, client deemed us too expensive to pay for only 2.5 days of work - after all, how does it make sense - if we are paying them that much, they should work 5 days!

So they cut us out. Two things happened:

1. Devs that got moved to other projects in the company told me they didn't know development could be so smooth and tried to replicate it in future projects, even tho they say it failed a lot of lessons they picked up from the framework were highly relevant in their future careers.

2. The company found a cheaper developer, he said "this is unusable and has to be rewritten" and rewrote it into "clean code", taking longer than the original project took. At least he works 5 days a week now.

We were going to do this. Get several months to set up a clean base for our new system with only the four most competent people working on it.

Then I went on vacation for a week.

Come back to a system that needs to be delivered to prod next month with 20 randos submitting PR’s…

WTF happened. Did you learn nothing from previous failures?

Good for you. Magical moments in careers are hard to find in my experience but they are so satisfying when you get there.

Glad whomever was over this didnt just drop the "dont rewrite" joel spolsky article and fight making it happen.

I actually was the one telling them not to rewrite, lol.

But the original code was a mess of matlab spaghetti, they couldn't find a way to hire for that. Not to mention turning it into a web service was already a big hack of java parsing a raw dump of matlab datastructures that nobody dared to touch.

I had to read the matlab code, and it tooks hours to decypher a few lines. Plus the language doesn't have great debugging and error handling capabilities and the tooling is quite terrible.

So rewriting to python won, and for once, I must say it was a good call.

Joel has since said that that he doesn't really agree with that advice anymore, at least not in the same way. Super annoying that it gets parroted over and over again as though it's the word of the lord.

I agree it shouldn't be parroted as though it's the word of the lord, like any advice it will be more or less applicable on specific situations. I've been on both sides of the let's redesign a thing plenty of times.

The balance is somewhere in the middle, it's valuable advice for what can go wrong when you don't think through the implications of the decision making process, understand how and why the system works the way it does, and what risks exist for what can go wrong with a big redesign. But like anything, if the risks are understood, then those risks can be accepted, mitigated, or rejected if appropriate, or provide guidance on why the redesign investment isn't worth the associated risks.

Google3 codebase. It's just so vast and it works. It's a miracle. I feel lucky to have seen it. Everytime you change it, it reruns the dependencies. Everyone has different view concurrently. Commits are efficient immutable snapshots. It's just incredible multiplayer. So massively beyond what can be done with GitHub. Really I feel it's peak codebase. I've not seen what the other big techs do but it blew my mind

+1

I can not understate how much I agree with parent comment.

The opposite of move fast, build a shitty prototype and iterate is a deliberate problem solving approach undertaken by the highest caliber of engineers. The actual challenges to be addressed are effectively addressed right at the design stage.

The result is a thing of immense beauty and elegance.

I will forever be grateful for the opportunity I had to see this magnificent piece of engineering in action.

I strongly suspect it’s not that useful for a lot of businesses.

So many have their code split between a two dozen clouds, BA tools (… does Google put that in the monorepo too? Or is that, which is a lot of the code at most businesses, not “really” code at Google?), vendor platforms, low-code tools you all hate and are trying to deprecate but god damned if you aren’t still spending dev hours on new features in it, et c…

I bet achieving anywhere near the full benefits would first require retooling one’s entire business, including processes, and bringing a whole shitload of farmed-out SaaS stuff in-house at enormous expense, most places.

It’s not just tooling; processes, infrastructure and dedicated teams for centralized infrastructure are what makes Google’s monorepo what it is. FWIW, most of the tools are publicly available or have good publicly available counterparts. What’s likely missing elsewhere is funding for infrastructure teams.

I think skill is also an issue here, in both directions. I have worked with a company that followed the opposite of move fast and it just turned into a 'who is the most correct' and 'what is the most elegant code' competition. We didn't push out a single new feature in the three years i'd worked there at that point. There was so much focus on code that we gave almost no time to business requirements.

The wider implication of this was that the number of tickets we got dropped dramatically, because users knew they'd never be resolved anyway.

Balance is key.

(Aside). To expand slightly, what robertsdionne is highlighting is the changing usage of this expression. In its original sense, e.g an issue is so important that it is impossible to overstate its importance. It is now increasingly used the other way around.

Old me would have said it’s used wrongly, but this happens all the time with language. Especially things being used in the opposite of their original sense, e.g. inflammable for flammable.

In my mind, "cannot overstate" always meant "impossible to overstate", but I think some people interpret/intend "cannot understate" to mean something like "must not understate". I don't know if that's really what they're thinking, but it is how I make sense of it. I have come to just avoid such constructions.

Edit: reminds me of an ancient SNL skit with Ed Asner in which he's a retiring nuclear engineer and as he heads out the door he says to his incompetent co-workers "Just remember, you can't put too much water in a nuclear reactor".

> In its original sense, e.g an issue is so important that it is impossible to overstate its importance

That's what they're saying, isn't it? I cannot overstate [because it's impossible].

> opposite of their original sense, e.g. inflammable for flammable

Inflammable was never the opposite of flammable. Those word have always been synonyms. The opposite was always non-flammable.

I agree but would add that within the codebase there are also vastly different experiences. The search ads serving stack for example is phenomenal, it is >10 years old, very complex and gets contributions from hundreds of engineers every week, yet it is quite easy to understand what is going on anywhere and it has great performance. But I also had to work with other code which had been through 3 rewrites, none of which had ever been totally completed, so there were overlapping paradigms everywhere and working with it was just a nightmare. Tests never caught anything leading to several post mortems.

Google3 codebase very consistently has clean code, but some of the architecture there is very much not great.

Some is great, some not so much.

Some of Verizon's code was much more elegant (though much smaller scope) from an API perspective, and really leaned into advanced type systems in a way Google has not.

Agreed. Once long (10+ years) ago, google3 may have been peak codebase. Nowadays, not so much.

Edit: I guess it also depends at what level of abstraction you work. High: can be easy breezy. Low: oh boy.

Postgres. I don't code in C if I can avoid it, since it often feels like an awful lot of extra typing while still having to worry about memory safety. But the Postgres codebase is extraordinarily well organized and respects the humans that work with it with its intelligent handling of memory and judicious use of macros.

I consider the core Postgres codebase to be the gold standard in development even though it's in a language I do not prefer to write in if given the choice.

Shout out to the pgrx folks. You're awesome! https://github.com/pgcentralfoundation/pgrx

That is nice to hear, albeit unsurprising. Their public documentation is some of the best that I have worked with. Postgres is such an impressive project overall.

Google's monorepo, and it's not even close - primarily for the tooling:

* Creating a mutable snapshot of the entire codebase takes a second or two.

* Builds are perfectly reproducible, and happen on build clusters. Entire C++ servers with hundreds of thousands of lines of code can be built from scratch in a minute or two tops.

* The build config language is really simple and concise.

* Code search across the entire codebase is instant.

* File history loads in an instant.

* Line-by-line blame loads in a few seconds.

* Nearly all files in supported languages have instant symbol lookup.

* There's a consistent style enforced by a shared culture, auto-linters, and presubmits.

* Shortcuts for deep-linking to a file/version/line make sharing code easy-peasy.

* A ton of presubmit checks ensure uniform code/test quality.

* Code reviews are required, and so is pairing tests with code changes.

I always find these comments about interesting, having worked at Facebook and Google, I never quite felt this way about Google's Monorepo. Facebook had many of the features you listed and quite performantly if not more so. Compared with working at Facebook where there are no owners owners files and no readability requirements, I found abstraction boundries to be much cleaner at FB. At google, I found there was a ton of cruft in Google's monorepos that were too challenging / too much work for any one person to address.

OWNERS files rarely get in the way - you can always send a code change to an OWNER. They are also good for finding points of contact quickly, for files where the history is in the far past and changes haven't been made recently.

Readability really does help new engineers get up to speed on the style guide, and learn of common libraries they might not have known before. It can be annoying - hell, I'll have to get on the Go queue soon - but that's ok.

I think I have heard similar things from other googlers and I think there might be two factors on why I think this:

- I worked on Google Assistant which was responsible for integrating many services. This meant I had to work with other peoples code way more regularly that many at google.

- I moved from FB to google - I'm not really sure how many people have had this experience. I think many of my colleagues at google found it surprising how many of the things they thought were unique to google actually also existed at FB.

At the end of the day, any of these processes have pros/cons but I think the cruft of having APIs that are a couple steps harder to evolve due to finding Readability/Owners for everything you touch just makes things slightly less cohesive and a trickier place to have a "good" codebase.

When I worked at FB, I would frequently rebase my code on Monday and find that, for example, the React framework authors or another smaller infra team had improved the API and had changed *every* callsite in the codebase to be improved. This type of iteration was possible in certain situations but was just much less common at google than at fb.

> I think many of my colleagues at google found it surprising how many of the things they thought were unique to google actually also existed at FB.

Google workers are groomed to believe Google is the best, and hence they are too. A corollary of that, then, is that nobody else has it that good, when in fact, others sometimes have it better.

I also made the move from FB to G and echo everything said above. Googlers have a massive superiority complex. In reality, it's naiveté.

My 2 cents: OWNERS is fairly useful, if only as a form of automating code reviewer selection. Readabilty is a massive drag on org-wide productivity. I have had diffs/CLs take MONTHS to be approved by every Tom Dick and Harry whose claws were added to my code and made me re-design whole project approaches, and they were only there because they're supposed to check if my new-lines are in the right spot for that language. I thought about quitting.

+1.

Going from FB to $REDACTED to Oculus was a pretty wild ride, there were a lot of different cultures, though I think generally speaking the best qualities filtered through.

(also, howdy former teammate)

People really underestimate how much productivity drain there is in having a bad code review culture. One of the worst things about working at Amazon was that any feedback on a merge request, no matter how small, required you to request a re-review.

Huh? Facebook has a lot of that infra because ex-Googlers built it there. It takes an insane amount of delusion to notice something common between a father and a son and say that the dad inherited it.

This isn't true at all for OWNERS files. If you try developing a small feature on google search, it will require plumbing data through at least four to five layers and there is a different set of OWNERS for each layer. You'll spend at least 3 days waiting for code reviews to go through for something as simple as adding a new field.

I agree that it could be worse! Facebook has significant (if not more) time spent and I found adding features to news feed a heck of a lot easier than adding features that interacted with google search. Generally a lot of this had to do with the number of people needed to be involved to ensure that the change was safe which always felt higher at Google.

I'm only an outside observer in this conversation but could it be that the review process (or the lack thereof) and the ease with which you can add new features has had an impact on the quality of the software?

The thing is, in my experience as a user Facebook (the product, not the former company) is absolutely riddled with bugs. I have largely stopped using it because I used to constantly run into severe UI/UX issues (text input no longer working, scrolling doing weird things, abysmal performance, …), loading errors (comments & posts disappearing and reappearing), etc. Looking at the overall application (and e.g. the quality of the news feed output), it's also quite clear that many people with many different ideas have worked on it over time.

In contrast, Google search still works reasonably well overall 25 years later.

There are pretty different uptime and stability requirements for a social product and web search (or other Google products like Gmail). When news feed is broken life moves on, when those products break many people can't get any work done at all.

One of Google's major cultural challenges is imposing the move slow and carefully culture on everything though.

I have the same background: I find the code quality at G to be quite a lot higher (and test pass-rate, and bug report-rate lower) than News Feed, which was a total shit-show of anything-goes. I still hold trauma from being oncall for Feed. 70 bugs added to my queue per day.

The flip side is of course that I could complete 4 rounds of QuickExperiment and Deltoid to get Product Market Fit, in the time it takes to get to dogfooding for any feature in Google.

Same and another vote for meta. Meta made the language fit their use case. Go into bootcamp change the search bar text to ‘this is a search bar!’ press F5 and see the change (just don’t ship that change ;D). It’s incredibly smooth and easy.

Googles a mess. There’s always a migration to the latest microservices stack that have been taking years and will take many more years to come.

Like meta just changed the damn language they work in to fit their needs and moved on. Google rewrites everything to fit the language. The former method is better in a large codebase. Meta is way easier to get shit done to the point that google was left in the dust last time they competed with meta.

I think what you're saying is true for www, but not fbcode, and the later starts to look a lot like google3. I agree though, Meta's www codebase has the best developer experience in the industry.

Huh, also having worked at both I had exactly the opposite experience. Google’s tools looked ugly but just worked. At meta there were actually multiple repos you might have to touch and tools worked unreliably across them. Owners files made sure there was less abandoned code and parent owners woild be found by gwsqueue bots to sign off on big changes across large parts of the repo by just reading these files.

Question I've always wondered: Does Google's monorepo provide all its engineers access to ALL its code?

If yes, given the sheer number of developers, why haven't we seen a leak of Google code in the past (disgruntled employee, accidental button, stolen laptop, etc)?

Also how do they handle "Skunkworks" stlye top-secret projects that need to fly under the radar until product launch?

Partial check outs are standard because the entire code base is enormous. People only check out the parts they might be changing and the rest magically appears during the build as needed.

There are sections of the code that are High Intellectual Property. Stuff that deals with spam fighting, for example. I once worked on tooling to help make that code less likely to be accidentally exposed.

Disclaimer: I used to work there, but that was a while back. They probably changed everything a few times since. The need to protect certain code will never go way, however.

Google laptops and workstations (anything that can actually access srcfs to get this data) are extremely monitored and controlled.

Very critical stuff (ranking, spam/abuse, etc) can be further protected via silos which lock down sections of the code base (but still allow limited interactions with the build).

Google spent significant engineering $$$ into its development tools and policies (generally building custom with no intent to ever monetize vs buying). I don't see a company today, in this climate, that would emulate that decision.

The very very important stuff is hidden, and the only two examples anyone ever gives are core search ranking algorithms and the self-driving car.

Even the battle-tested hyper-optimized, debugged-over-15-years implementation of Paxos is accessible. Though I’m sure folks could point out other valuable files/directories.

Former employee here. I remember a third example: the anti-DoS code is hidden. I remember this because I needed to do some very complicated custom anti-DoS configuration and as was my standard practice, I looked into how the configuration was being applied. I was denied access.

Fourth example: portions of the code responsible for extracting signals from employees' computers to detect suspicious activity and intrusion. I suspect it's because if an employee wants to do something nefarious they couldn't just read the code to figure out how to evade detection. I only knew about this example because that hidden code made RPC calls to a service I owned; I changed certain aspect of my service and it broke them. Of course they fixed it on their own; I only got a post-submit breakage notification.

It's extremely easy to detect a disgruntled employee making a copy of source code. There's no accidental button to leak. There's no source code on laptops as policy doesn't allow it, with limited exceptions only.

But there was a giant leak a long time ago. It was called Operation Aurora done by China. Legend has it that to this date the Chinese search engine Baidu still uses stolen code from Google.

Within the monorepo there is the notion of "silos" where access to directories can be restricted to groups of people/bots. Though I believe that's exceedingly rare, I've never come across one.

Google3 monorepo source isn't, by policy, supposed to leave the corp network workstations, and can't even be on your corporate provided laptop (except for some edge cases in mobile apps dev). Even during full COVID lockdown WFH we had to remote into our machines. (I worked on embedded stuff and had to compile on my office workstation, scp the binaries home, and flash my device, and repeat. Super slow cycle.)

So, anyways, source code being basically on-premise only and on machines that they can fully audit and control... Would you be stupid enough to "cp -r srccheckout /media/MYUSBSTICK" on such a box?

Also believe it or not they used to have a very open internal culture at Google because the bulk of employees genuinely liked the company and its stated mission and there was a bit of a social contract that seemed to be upheld. Stuff didn't generally leak out of the wide open all hands, even. Past tense.

I recently left Google and knew it was going to be a step down from Google's build ecosystem, but I wasn't prepared for how far a step down it would be. It's the only thing I miss about the place, it' so awesome.

Google's code, tooling and accompanying practices are developing a reputation for being largely useless outside Google ... and many are starting to suspect it's alleged value even inside Google is mostly cult dogma.

Google used to have a near monopoly on the most expensive, educated, devoted, and conscientiously willful people and imposed very few demands on their time. The lengths to which they were willing to go, to make everything they did with the tools pleasant and elegant, was orders of magnitude beyond anything I'd seen.

Some of us thought that the magic of these people would be imbued in the dev tools that they created, so if enterprises adopted the tools, then they'd reap the benefits of that same magic too. But this simply wasn't true. The tools didn't actually matter; it was the way they used them.

For example, when other companies started adopting tools like Bazel (open source Blaze) they wanted features like being able to launch ./configure scripts inside Bazel, which totally violates the whole point of Bazel, and never would have been allowed or even considered inside Google. The Bazel team was more than happy to oblige, and the users ended up with the worst of all worlds.

Google's systems were designed to index mountains of low value data at hitherto unseen scale, and they're good at that. But, to-the-second system-wide precision with full audit trails ... not so much.

You keep seeing startups with ex-Googlers that think they can "disrupt" Fintech with Google's "secret sauce" ... this tends to go badly.

I've had to clean up one of these messes where, in all seriousness, even a pre-2000 LAMP stack (never mind Java) implemented by people who understood the finance domain would have worked better.

> Google's code, tooling and accompanying practices are developing a reputation for being largely useless outside Google ...

Not that I don’t believe you, but where do you see this?

I can vouch for it. It's the main reason I quit: none of the "hard" skills necessary to code at Google were transferrable anywhere outside of Google. It would have been easy enough to skate and use "soft" skills to move up the management ladder and cash big checks, but I wasn't interested in that.

The reason it's not transferrable is that Google has its own version of EVERYTHING: version control, an IDE, build tools, JavaScript libraries, templating libraries, etc, etc. The only thing I can think of that we used that wasn't invented at Google was SCSS, and that was a very recent addition. Google didn't even use its own open-source libraries like Angular. None of the technologies were remotely usable outside Google.

It might sound cool to use only in-house stuff, and I understand the arguments about licensing. But it meant that everything was poorly-documented, had bugs and missing features that lingered for years, and it was impossible to find a SME because whoever initially built a technology had moved on to other things and left a mess behind them.

Some people may be able to deal with the excruciating slowness and scattered-ness, and may be OK with working on a teeny slice of the pie in the expectation that years later they'll get to own a bigger slice. But that ain't me so I noped out as soon as my shares vested.

12 year current Googler here. You are absolutely correct about "Google has its own version of EVERYTHING". Midway through my current career, I started to get existential dread about the fact that I wasn't "up to date" on any current development practices or frameworks.

Partly, this was assuaged through participating in open source projects in my free time. That's how I learned Docker, Github workflow, React, Vue, Bootstrap, Tailwind, etc.

But at the same time, I think it is a mistake to consider working with tools/languages/frameworks to be the only "hard" skills. Galaxy brain is realizing that anyone can learn a language/framework/workflow in a month or so. The real work is applying sound principles to the design and production of meaningful artifacts within those systems.

Though credit where it’s due, some of their tools really have been years ahead of anything outside of google, e.g. the closure compiler that made javascript development scalable.

I have seen this discussed in hiring decisions. I don't know that it played a large factor in a decision, but lack of experience in the standard tools/practices/terms of software development because of a career at Google was definitely a discussion point.

I had a bunch of very tenured teammates that didn’t really know how to use git, so there were only a few of us comfortable enough integrating and interacting with an open source dependency repo.

I haven't worked at google, but this is something I have heard from a few people. Reputation is largely word of mouth, so it checks out for me. I suspect the skills/tools at most large companies are increasingly less transferrable as they continue to grow in scale and scope.

> Google's code, tooling and accompanying practices are developing a reputation for being largely useless outside Google .

It is almost a tautology

Why would they be useful for domains they are not designed for?

Parts have been. Sourcegraph is basically the code search post built by ex-Googlers originally. Bazel is the open source build tool. Sadly, most of these things require major work to set up yourself and manage, but there's an alternate present where Google built a true competitor to GitHub and integrated their tooling directly into it.

I’m just one incompetent dev, but I’ll throw this in the convo just to have my perspective represented: every individual part of the google code experience was awesome because everyone cared a ton about quality and efficiency, but the overall ecosystem created as a result of all these little pet projects was to a large extent unmanaged, making it difficult to operate effectively (or, in my case, basically at all). When you join, one of the go-to jokes in their little intro class these days is “TFW you’re told that the old tool is deprecated, but the new tool is still in beta”; everyone laughs along, but hopefully a few are thinking “uhhh wtf”.

To end on as nice of a note as possible for the poor Googs: of all the things you bring up, the one I’d highlight the biggest difference on is Code Search. It’s just incredible having that level of deep semantic access to the huge repo, and people were way more comfortable saying “oh let’s take a look at that code” ad-hoc there than I think is typical. That was pretty awesome.

Imho the reason for the deprecated and beta thing is because there is a constant forward momentum.

Best practices, recommendations and tooling is constantly evolving and requires investment in uptake.

I sometimes feel like everything is legacy the moment it's submitted and in a constant state of migration.

This requires time and resources that can slow the pace of development for new features.

The flips side is this actually makes the overall codebase less fractured. This consistency or common set of assumptions is what allows people to build tools and features that work horizontally across many teams/projects.

This constant forward momentum to fight inconsistency is what allows google3 to scale and keep macro level development velocity to scale relative to complexity.

That’s all well said, thanks for sharing your perspective! Gives me some things to reflect on. I of course agree re:forward momentum, but I hope they’re able to regain some grace in that momentum with better organization going forward. I guess I was gesturing to people “passing the buck” on hard questions of team alignment and mutually exclusive decisions. Obviously I can’t cite specifics bc of secrecy and bad memory in equal amounts, so it’s very possible that I had a distorted view.

I will say, one of the things that hit me the hardest when the layoffs finally hit was all the people who have given their professional lives to making some seriously incredible dev tools, only to be made to feel disposable and overpaid so the suits could look good to the shareholders for a quarter or two. Perhaps they have a master vision, but I’m afraid one of our best hopes for an ethical-ish megacorp—or at least vaguely pro social—is being run for short term gain :(

However that turns out for society, hopefully it ends up releasing all those tools for us to enjoy! Mark my words, colab.google.com will be shockingly popular 5y from now, if they survive till then

I guarantee you that the master vision is exactly what you wrote.

Google is not a software company, it is an advertising system providing cash-flow to a hedge fund that comprises a large part of every pension and retirement fund in America. It's far too important as simply a financial entity to risk anything on ....product development.

My experience with google3 was a bit different. I was shocked at how big things had gotten without collapsing, which is down to thousands of Googlers working to build world-class internal tooling. But you could see where the priorities were. Code Search was excellent - I'd rate it 10/10 if they asked.

The build system always felt more like a necessary evil than anything else. In some parts of google3 you needed three separate declarations of all module dependencies. You could have Angular's runtime dependency injection graph, the Javascript ESM graph, and the Blaze graph which all need to be in sync. Now, the beautiful part was that this still worked. And The final Blaze level means you can have a Typescript codebase that depends on a Java module written in a completely unrelated part of google3, which itself depends on vendored C++ code somewhere else. Updating the vendored C++ code would cause all downstream code to rebuild and retest. But this is a multi billion dollar solution to problems that 99.99% of companies do not have. They are throwing thousands of smart people at a problem that almost everyone else has "solved" by default simply by being a smaller company.

The one tooling I think every company could make use of but doesn't seem to have were all of the little hacks in the build system (maybe not technically part of Blaze?). You could require a developer who updates the file at /path/to/department/a/src/foo.java to simultaneously include a patch to /path/to/department/b/src/bar.java. Many files would have implicit dependency on each other outside of the build graph and a human is needed to review if extra changes are needed. And that's just one of a hundred little tricks project maintainers can employ.

The quality of the code was uniformly at least "workable" (co-workers updating parts of the Android system would probably not agree with that - many critical system components were written by one person poorly who soon after quit).

> But this is a multi billion dollar solution to problems that 99.99% of companies do not have.

I know it's trendy for people to advocate for simple architectures, but the honest-to-god truth is that it's insane that builds work ANY OTHER WAY. One of the highest priorities companies should have is to reduce siloing, and I can barely think of a better way to guarantee silos than by having 300 slightly different build systems.

There is a reason why Google can take a new grad SWE who barely knows how to code and turn them into a revenue machine. I've worked at several other places but none of them have had internal infrastructure as nice as the monorepo; it was the least amount of stress I've ever felt deploying huge changes.

Another amazing thing that I don't see mentioned enough was how robust the automatic deployments with Boq/Annealing/Stubby were. The internal observability library would automatically capture RPC traces from both the client and server, and the canary controller would do a simple p-test on whether or not the new service had a higher error rate than the old one. If it did? The rollback CL would be automatically submitted and you'd get a ping.

This might sound meh until I point out that EVEN CONFIG CHANGES were versioned and canaried.

I've worked at the majority of FAANG.

Facebook's build system works the same as Googles, because most of FB's infra was made by ex-Googlers around 10-15 years ago. The worst thing I can say about Blaze is basically already pointed out above, sometimes you need to write little notes to the presubmit system to ensure cross-boundary updates. Whatever, it's all text files in the end.

The wildest was at Apple. It's just as you said, 300 build systems. Not only that, but 300 source code repositories! Two teams in the same hall that hang out all the time could be using git and svn, for no good reason besides what someone wrote "init" in 20 years ago. There was no cross team communication, by design, because Steve Jobs was paranoid. Their sync mechanism was to build the entire stack once a night, and force everyone to full-reinstall their OS and toolchain to "rebase". Insane.

I definitely agree most companies should use a monorepo. Most companies don't need Blaze, though.

And the whole rollout system was excellent. I wish that tech was standard but I have a vague idea of how much work that would be to implement and few companies will be able to afford to get that right.

Edit: I forgot to mention - I absolutely hated Cider and all of the included plugins. Sure the code at Google was fine but the code editing experience destroyed all of the fun of coding. Is that function signature correct? You'll find out in 45 seconds when the Intellisense completes! And when I was there Copilot was a thing outside of Google but we were not allowed to use any AI (even Google's own AI) to write code. The whole situation was so bad I wrote a few paragraphs about it in my offboarding survey.

> You could require a developer who updates the file at /path/to/department/a/src/foo.java to simultaneously include a patch to /path/to/department/b/src/bar.java.

Could you elaborate on how this worked exactly?

There's a configuration directive you put in a plain text file in the monorepo which lets you configure:

* File A's path

* File B's path

* The message shown on the commit's review page if B isn't updated when A is updated

* How a developer can override the alert (Which would be a custom named directive added to a commit message, like "SKIP_FILE_SYNC_ALERT=true")

You then need to either commit a diff to file B when file A is changed or override it in order to get the commit added to HEAD. This is just one of many different "plugins" for the CI system that can be configured with code.

> Entire C++ servers with hundreds of lines of code can be built from scratch in a minute or two tops.

Hundreds, huh? Is this a typo? It makes me wonder if the whole comment is facetious. Or do C++ programmers just have very low expectations for build time?

The public version of Google's build tool is Bazel (it's Blaze internally). It has some really impressive caching while maintaining correctness. The first build is slow, but subsequent builds are very fast. When you have a team working on similar code, everyone gets the benefit.

As with all things Google, it's a pain to get up to speed on, but then very fast.

Why do so many people like monorepos?

I tend to much prefer splitting out reusable packages into their own repos with their own packaging and unit tests and tagging to whatever version of that package. It makes it MUCH easier for someone to work on something with minimal overhead and be able to understand every line in the repo they are actually editing.

It also allows reusable components to have their own maintainers, and allows for better delegation of a large team of engineers.

Have you ever worked at FB / Google / whatever other company has huge mono repo with great tooling?

I went from many years at FB, to a place like you describe - hundreds of small repos, all versioned. It’s a nightmare to change anything. Endless git cloning and pulling and rebase. Endless issues since every repo ends up being configured slightly differently, and very hard to keep the repo metadata (think stuff like commit rules, merge rules, etc) up to date. It’s seriously much harder to be productive than with a well-oiled monorepo.

With a monorepo, you wanna update some library code to slightly change its API? Great, put up a code change for it, and you’ll quickly see whether it’s compatible or not with the rest of the whole codebase, and you can then fix whatever build issues arise, and then be confident it works everywhere wherever it’s imported. It might sound fragile, but it really isn’t if the tooling is there.

I have worked at a company that has huge monorepos and bad tooling.

Tooling isn't the problem though, the problems are:

- multiple monorepos copying code from each other, despite that code should be a library or installable python package or even deb package of its own

- you will never understand the entire monorepo, so you will never understand what things you might break. with polyrepos different parts can be locked down to different versions of other parts. imagine if every machine learning model had a copy of the pytorch source in it instead of just specifying torch==2.1.0 in requirements.txt?

- "dockerize the pile of mess and ship" which doesn't work well if your user wants to use it inside another container

- any time you want to commit code, 50000 people have committed code in-between and you're already behind on 10 refactors. by the time you refactor so that your change works, 4000 more commits have happened

- the monorepo takes 1 hour to compile, with nothing to compile and unit test only a part of it

- ownership of different parts of the codebase is difficult to track; code reviews are a mess

I think all of your problems are root caused by the phrase "multiple monorepos". This sounds more like "polyrepos" or something, multiple siloed repos that themselves might contained disparate projects, languages and tooling.

Google3 is a true monorepo. 99% of code in the company is one repo, with minor exceptions for certain open source projects and locked down code.

Edit: for example, you can change YouTube code, Google Assistant code, a Maps API, some config for how URLs are routed at top level load balancers, etc all in one CL if you really wanted to/needed to.

All of these are solved with tooling.

- dont have multiple monorepos

- use blaze or similar and run all downstream tests, the binary for your ml model includes pytorch whether you build from source or requirements.txt.

- other people committing doesn't matter if you are isolating presubmits and similar.

- using a blaze-like you never compile the whole monorepo

- code owners etc. makes this straightforward.

Like, as someone whose career has been mostly at Google, these are not problems I encounter at all, or only in cases where you'd have similar scope of problems no matter the repo structure.

If you only need to run the tests you affect, and only need to sync and update files touched in your CL, external changes are generally not impactful, sync & submit is a quick process even if people are submitting things elsewhere.

It's only a problem if someone submits a file you're touching, in which case you just have standard merge conflict issues.

If you have a reusable component in a separate repository and need a change, you have to submit that, merge, release, then bump the version in the downstream project to use it. Then if someone else uses the project updates but hits an issue that you introduced they have to go fix it, perhaps a month later, with no context of what changed. Or they just don’t upgrade the version, and reimplement what they need. With a monorepo it would be one change, and your change breaking someone else’s would get flagged and fixed with the code change. I’ve seen the amount of shared code get less and less and more stale with polyrepo.

Up to what scale?

This works well for a couple dozen repos per team in my experience. It’s also my preferred way to work.

It doesn’t scale so well to hundreds of repos per team without significant tooling. At some point anything cross-cutting (build tool updates, library updates, etc) becomes hard to track. Repos are left behind as folks change teams and teams are reorg’d.

I’ve never worked in a monorepo, but I can see the appeal for large, atomic changes especially.

I can change a dependency and my code at the same time and not need to wait for the change to get picked up and deployed separately. (If they are in the same binary. Still need cross binary changes to be made in order and be rollback safe and all that.)

The run time library for Turbo Pascal/Delphi for Windows was completely documented, sane, and very easy to work with. The working examples really helped.

The free Pascal RTL seems opaque in comparison. Their reliance on and archaic help file build system keeps contributors away. Thus it's poorly documented at best.

One of them stands out, due to being super-productive, over years, and then decades.

A large system that was originally written by only two super-productive engineers (I mean real engineers, both with PhDs in an area of Engineering). And a comparably capable and essential IT person.

The reasons for the super-productivity include one of the developers choosing great technology and using it really well, to build a foundation with "force multiplier" effects, and the other developer able to build out bulk with that, while understanding the application domain.

Another reason was understanding and being pretty fully in control of the code base, so that, as needs grew and changed, over years, someone could figure out how to do whatever was needed.

One of the costs was that most things had to be built from scratch. Over time that also proved to be an advantage, because whenever they needed (put loosely) a "framework" to something it couldn't do, they effectively owned the framework, and could make dramatic changes.

When I said "costs", I mean things like, many times they needed to make a component from scratch that would be an off-the-shelf component in some other ecosystem. So if someone looked closely at how time was sometimes spent, without really understanding it or knowing how that panned out, it would look like a cost that they could optimize away. But if they looked at the bigger picture, they'd see a few people consistently, again and again, accomplishing what you'd think would take a lot more people to do.

It helped that the first programmer also became the director for that area of business, and made sure that smart engineering kept happening.

Someone might look for a reason this couldn't work, and think of bus factor. What I think helped there was the fact that the work involved one of those niche languages that attract way more super programmers than there are jobs. "Gosh, if only we had access to a secret hiring pool of super programmers who were capable of figuring out how to take up where the other person left off, and we had a way to get them to talk with us...")

It was easy to imagine a competitor with 100 developers, not able to keep up, and at many points getting stuck with a problem that none of them were able to solve.

It started as a bespoke structured data and Web architecture, which did somewhat complex things that would've been a huge headache to build and maintain in the dominant languages of the time, but were viable for one person to figure out and implement in Scheme and PostgreSQL.

That bespoke architecture and implementation language lent itself to a lot of rapid domain-specific functionality, as well as doable architecture changes over time.

There's actually a bespoke object metamodel mapping atop the RDBMS, and it permits customers who want to use SQL direct to the database to do so.

And, IMHO, PostgreSQL is the default choice for most storage backend purposes, unless one has figured out a good reason something else is better. :)

There were also multiple kinds of large blob storage needs, and changing architecture needs over time (e.g., multi-tenant, changing authn implications of things like browser plugins, scalability, move to cloud), so systems programming skills and controlling your framework comes in handy, for not getting stuck or blocking on vendors, but just solving the problem.

Scheme, Common Lisp, Haskell, [edit] Smalltalk, probably more obscure ones...

Also, Erlang, Rust, and Clojure, though those have been rumored to be employable, so no longer get as much of the filter as you get when it was just people caring strongly enough to want to use a particular language despite the unemployability.

Whether that caring happened because of skill (to identify something good), or skill happened because of caring (to invest the time in exploration and thinking and practice), these communities seem to get more than their share of great programmers.

And so you know where to look for them, and you have a reason that they might want to talk to you.

>probably more obscure ones.

Apl/BQN worth adding for the recent CUDA/vector language explosion, and has it's fair share of extraordinary programmers!

Forth an interesting one too look, although never properly used.

miniKanren/Prolog another vein to branch down (unification/logic programming)

SecDB at Goldman. There were runtime issues aplenty but the SDLC was top notch and I’ve never heard of or seen better - and it’s 30 years old. Instant deployments and rollbacks globally, you show up to work and commit to prod in the first half day. Within a week you can go on rota. It’s why Goldman was so nimble in the 90’s, and 00’s. It’s still a remarkable system but underinvestment has taken its toll after margins evaporated in 2010.

I've seen a few people say 'google3'.

Q: is it actually the code that you loved, or simply the tooling that exists?

(and if it's tooling, why can't that type of tooling be replicated for other codebases outside of google?)

That type of tooling can be replicated. That's why every Xoogler tries so hard to get Bazel adopted on everything they touch. Sometimes that's appropriate, sometimes it's not, but that's why.

Bazel isn't the whole of everything though, the other piece being exported is kubernetes, which isn't Google's borg, but that's its roots. There's Apache Airflow if you need a workflow engine like Sisyphus, no shortage of databases to choose from, though now we're drifting into the operations side of things.

But basically, Google has invested untold millions of dollars in the form of SWE and other engineering hours into making Google3 operate. If anyone else invested that kind of dough, with enough smarts, they'd also be able to make it a good experience. The problem is few people have that kind of budget, and even fewer invest in that thing, preferring to use free tools instead.

What tool do you use to edit code, and how much did your employer spend on that for you?

It does! Working like Google is like that but having had decades to build that dream, using the resources that only a megacorp could bring to bear in the problem, and in a monorepo. The fact that it's a monorepo is not to be dismissed. git doesn't work for monorepos.

Codespaces is like CitC; I haven't been a Google since the rise of AI so I can't comment on how the internal equivalent to Copilot is; Actions is very primitive compared to what Google has, but yeah, you can see where it's going. it doesn't compare right now but it could, eventually.

The Gemini-powered copilot is fine, but has so far, for me, only done automated like "heres a simple method based on the comment you wrote", or "I patched in a change to the other callsites when you renamed this thing". I have also not used it in 2 months so it's probably better already.

I still mainly do my LLM assisted coding in chat-style interfaces that are more like pair programming by mail.

I think the Gemini code assist is coming out as a consumer product, I forgot the name, because Google is awful at branding.

> That's why every Xoogler tries so hard to get Bazel adopted on everything they touch.

After burning an entire weekend like five years back trying to get a released version of tensorflow to build from source [0], I'm catastrophically disinclined to use Bazel for anything.

I found its diagnostics to be utterly unhelpful, its documentation to say nearly nothing I needed to know, and the various Internet resources for the build system to be somewhere between "as confused and lost as I was" and "total fanboy who is so expert in the system that they are incapable of speaking like anything other than an architecture astronaut".

I'm sure it's legitimately fuckin amazing when you learn it at a company that's big enough to have one or more entire teams dedicated to internal developer tooling (and training for and documentation of the same), but (at least in my experience) for those of us on the outside, it's just bad, bad, bad.

[0] The fucking thing wouldn't even build in the officially-supplied "build tensorflow" Docker image. I was utterly unable to find out why. I get that this indicates that the tensorflow folks fucked up somehow, but the fact that I was utterly unable to figure out how to understand WHAT they fucked up is pretty damning.

Its both. The tooling has a very direct impact on the quality of the code.

I think the reason its not easy replicable is:

1. It takes a ton of initial investment and ongoing maintenance but its worth it when your code base is gigantic.

2. There is a consistent set of top down enforced rules. With the consistency it becomes much, much easier to build tight integrations between tools.

(almost?) everything is buildable by a single build system (blaze). When anyone can consistently build/test/run anything in your codebase it becomes a lot easier to build a whole host of potential tools like code search.

Probably someone can dive deeper than I can. But one thing I learned the most important property for a code base to be maintainable/scalable is consistency.

One more thought, is also how much other systems can utilize the same tooling/workflows by just storing things in source code. Things that would probably traditionally stored as application state in a database are often stored instead in google3 as config.

Things like on team rosters, on call rosters, code review permissions, service permissions, feature flags.

All of it stored in google3 and can all utilize the same consistent set of tooling like presubmit test, integration tests, deployment automation, permissions, code search.

Its sort of like Infrastructure as Code but more.

> When anyone can consistently build/test/run anything in your codebase it becomes a lot easier to build a whole host of potential tools like code search.

As someone who hasn't worked for Google, how does Google's implementation of this differ from e.g. Guix/nixpkgs? Being able to easily build/test any available package is a big reason I like using tools like those.

> (and if it's tooling, why can't that type of tooling be replicated for other codebases outside of google?)

The elegance of the tooling from what I hear is that there's tons of different tools maintained by different teams that work seamlessly (and fast) together to produce google3 and all of its supporting pieces.

But to answer your question, sure it can. But good luck building your own. Google has been doing this since the 2000s.

And if you're a big company already, you've already bought into your existing patterns & design choices; things like that are VERY hard to change.

Pretty much any internal tool/TUI/CLI/library I've created. If I had to guess I'd say at most 25% of the company projects I've worked on have launched AND have consistent usage. Working hard on something just for it to wither crushes my soul but internal projects are different. They're all skunk works projects. No tickets. No project/board. No PM pushing back on how many points (read: hours) something should be. I'm solving real problems that directly impact the quality of life for myself and my coworkers. The best part is getting real, genuine, feedback. If something sucks they'll tell you and they won't sugarcoat it.

I love this take. What language(s) do you typically use to write CLI programs? I'm also interested in learning about what types of internal TUI tools you have created.

I’m firmly convinced that any codebase written by any 4 people in the same room is better than any written by any higher number or distributed.

You'll be surprised, I think the best codebase I ever contributed to is the Rust compiler, which is being built by a huge distributed team.

Despite it being complex and me being in there for the first time, I could quickly track down a bug in the parser and fix it. As a cherry on top the communication around the pull request was top notch too.

It took my company a team of 8 fully-remote devs to fix the clusterfuck of a codebase that the "4 guys in a room" (the CTO and early devs) built up with spit and tree bark,.

One that had a sort of improvised facade/adapter pattern (it didn't really follow either) in a clearly cut multilayered and pipelined structure, with actor model bits where it made sense.

The code wasn't simple, at all. It took active training of new arrivals for them to understand it. But it was very well thought out, with very few warts given the complexity, and extremely easy to extend (that was the main requirement, given constant changes in APIs and clients).

We had an API, with multiple concurrent versions, that transformed requests into an intermediate model, on which our business logic operated, later targetted external APIs (dozens of them, some REST, some SOAP, some under NDAs, some also with multiple versions), whose responses turned again into the intermediate model, with more business logic on our end, and a final response through our API. Each transaction got its context serialized so we could effectively have what was an, again improvised, "async/await"-like syntax in what was (trigger warning) C++03 code.

The person who engineered it didn't have formal CS background.

My favourite projects are small, with very focused goals and features.

I have a Laravel project that I have maintained for a customer for seven years. The app is straightforward and allows users to create portals that list files and metadata, such as expiration dates and tags.

Every other year, they ask me to add a new batch of features or update the UI to reflect the business's branding. As the app is so small, I have the opportunity to review every part of the app and refactor or completely rewrite parts I am not happy with.

It is a joy to work on and I always welcome new requests.

Facebook but for different reasons than most. We called this swapping out the airplane parts mid-flight

A lot of effort goes into language design and tooling to enable continuous migration of code. Rather than re-writing entire repos, existing code is continuously upgraded through semi- & fully-automated code-mods. Every day thousands of patches are landed to upgrade APIs with new type safety, security restrictions, deprecations and other code maintenance.

Most other company repos I worked on had major re-writes outside of the mainline until one day there was a sudden and often troublesome switch-over to the new code.

Code is constantly changing, and unless you have a continuous process for landing and testing those changes, you are going to suffer when you try to apply the changes needed to address the accumulated tech-debt.

> We called this swapping out the airplane parts mid-flight...

I work fairly heavily with Facebook's Graph API, and I think they may have employed too many Boeing engineers for the part swaps lately.

I really wish more people believed in continuous improvement and the malleability of software. Giant heroic refactors or redos are so annoying, but so are ten year old compiler or kernels.

The best codebases I worked on were from startups that failed due to lack of product market fit. Oh man, their code was so beautiful and optimized…

Is this true or a bit of humor? Of the three startups I worked for, the one that had the best exit also had the best code. Ime shitty code hamstrings startups into slow execution during the growth phase, and kills them despite pmf, because competition winds up moving much faster.

Although I'm only on job three and have not had that much involvement with open source, I think my current employer (Attio) has one of the best codebases I've seen.

Qualitatively, I experience this in a few ways: * Codebase quality improves over time, even as codebase and team size rapidly increase * Everything is easy to find. Sub-packages are well-organised. Files are easy to search for * Scaling is now essentially solved and engineers can put 90% of their time into feature-focused work instead of load concerns

I think there are a few reasons for this:

* We have standard patterns for our common use cases * Our hiring bar is high and everyone is expected to improve code quality over time * Critical engineering decisions have been consistently well-made. For example, we are very happy to have chosen our current DB architecture, avoided GraphQL and used Rust for some performance-critical areas * A TypeScript monorepo means code quality spreads across web/mobile/backend * Doing good migrations has become a core competency. Old systems get migrated out and replaced by better, newer ones * GCP makes infra easy * All the standard best practices: code review, appropriate unit testing, feature flagging, ...

Of course, there are still some holes. We have one or two dark forest features that will eventually need refactoring/rebuilding; testing needs a little more work. But overall, I'm confident these things will get fixed and the trajectory is very good.

My own, lol. Which just means I need to work on more codebases.

I’m making a tool to convert data schemas to SQL via a UI for lay-users. Just like https://react-querybuilder.js.org/ which is basically a UI based SQL generator. For work.

Except my version extends the idea much further, blending Excel like functionality with functions that can act on Fields and Rules.

What makes it good?

For one, it’s a from scratch project with almost no third party libraries.

For two, I fortunately chose a recursive data structure to represent the data schema, and that has really worked out well. Early on I tried 4 other approaches to represent the data, but went back to recursive feeling it was the best choice.

Furthermore I’m using React, but it’s heavily leveraging reducers. Specifically Groups have a reduced, Rules have a reducer, and Fields have a reducer. The reducers are linked in a chain top to bottom, where changes on a Field change a Rule, and changes on a Rule change a Group. It’s been extremely clean to work with.

Because the base data schema is recursive (Groups contain groups contain groups), most of the functions that manipulate the schema are recursive. There is a real elegance to the code, each recursive function has a very obvious base case, and a very obvious recursive path.

And for the final outcome, walking the query data structure and spitting out the equivalent SQL is also recursive, and feels elegant, coming in at under 40 lines.

Literally as I’ve been writing this codebase, everything somehow perfectly fell into place. I was marvelling near the end that it felt like I chose all the best possible logic paths to build this somehow.

I’m hoping to get the okay from work to open source it (fully open source). The only cruft of the project is the types I’m using, the interface of the code could be improved with generics

My past three employers code bases: mono-repos, Bazel, lots ot C++ and Python, thousands of libraries and tools, code generation and modeling tools that are fully integrated into the build, easy cross compilation, large integration tests just one bazel test invocation away, hermetic and uniform dependencies...

Python is only used for build-time tooling (modeling, code generators) and developer tooling. All the on-target code is C and C++, and only that is cross compiled (Linux, QNX, various RTOS and x86+aarch64)

I work at Google on database systems and would still pick the VMware codebase (specifically the virtual machine monitor component).

It was by far the most impressive piece of software engineering I’ve ever had privilege of perusing.

Not wanting to start a flame war, but the appearance of jquery has been an eye-opener for me. The official API of the Web was instantly outdated thanks to the incredible cleverness of a bunch of developpers. Wow … that was, for me, exactly what open source was all about.

The latest Go micro-service I have built.

About once a year roughly, for the last couple years, the opportunity has arisen to greenfield a Go micro-service with pretty loose deadlines.

Each time I have come into it with more knowledge about what went well and what I wasn't particularly happy with the last time. Each one has been better than the last.

I've been building software professionally for twenty years, and these micro-services have been one of the few projects in that time that have had clear unified vision and time to build with constant adjustments in the name of code quality.

Sounds fun. I have a similar policy but only every other year. It means I know what the hell I am talking about when in meetings with people who have slow or inefficient services. It also means my job of mostly telling other people how to solve problems is still a lot of fun for me.

This is going to sound insane, but I worked on a firmware codebase at an industrial automation company that was remarkable.

Everything was working against it. No RTOS. Subversion. GCC 5-point-something (I think?).

It was an incredible mass of preprocessor crimes. I'm talking about #including the same file 10 times and redefining some macros each time to make the file do something different.

It used a stackless coroutine library called Protothread, which itself is a preprocessor felony.

And yet? It was brilliant. Compilation was lightning quick. F5, lean back in your chair, and boom, you're running your code. I understand that this kind of thing is normal for web/backend/etc folks, but I yearn for the days of sub-15 second firmware compile times.

It was easy to flip a couple flags and compile it to a Win32-based simulator. Preprocessor felonies are felonies, but when you stick to a small handful of agreed-upon felonies, you can actually reap the benefits of some very sophisticated metaprogramming, while staying close enough to the hardware that it's easy to look at a datasheet and understand exactly what your code is doing.

> Preprocessor felonies are felonies, but when you stick to a small handful of agreed-upon felonies, you can actually reap the benefits

There's a saying, "Only commit one crime at a time". Knowing when to cheat is a fairly high-risk but pro move.

Realistically any code base where the engineers had at least a basic understanding of programming. You do not know suffering until you've seen someone hard code basic variables, we're talking about strings all over the place, and then they just copy the function again to replace the strings .

I've legitimately left jobs over bad code. We're talking about code that did nothing in reality. The best code bases have been ones where I've been able to lead the direction. I get to know exactly how things work. I'm privileged to have a job where I essentially created the initial framework right now .

Plus I'm fully remote, life is pretty good.

> any code base where the engineers had at least a basic understanding of programming

I felt this in my soul as soon as I read it. The number of people writing code who should not even be speaking to others because of how clearly they don't understand anything technical is unfathomably high.

Our industry hires seat fillers and tells them to write software. And then they do, and it's every bit as bad as it sounds.

google3, all the devex tooling was taken care of by other teams. Tons of useful library functions available to import, accumulated over decades.

It's refreshing to read the replies here as I am currently working in the worst codebase I've ever seen in my life. It's just unimaginably bad.

Unreal Engine, by far.

It's not only just extremely well written in general (with your only chance of 'really' learning the engine to go through it, read comments, and the like) but it also defies the wisdom 'everybody knows.' That wisdom being that premature optimization is the root of all evil - like we're supposed to just benchmark things, see where the bottleneck is and then try to work on optimizing spots like that.

Unreal doesn't do this. There are countless thousands of micro-optimizations everywhere. For one striking example there's even a fairly substantial system in place that cache the conversion between a quaternion and a euler rotation. This would never, in a zillion years, be even close to a bottleneck. But with thousands of these little micro-optimizations everywhere you get a final system that just runs dramatically better than any comparable engine.

In more general terms, they've also taken advantage of the 'idiomatic flexibility' that C++ offers to create a sort of Unreal C++ language that is also just lovely to use and feels much closer to something like C# in terms of luxuries like the lack of manual memory management, garbage collection, reflection (!!), and so on. The downsides are that compile times are horrible (even though a cached compile might only take 10 or 15 seconds, it feels like forever when trying to work out one specific issue) and C++ intellisense, especially in a preprocessor heavy environment, is pretty mehhhh.

The first few years at a previous startup I worked on. Java. CI/CD in place when merging PRs. But mostly, the thing that made the code base great was the fact that the code wrapped external libraries in small interfaces that were _owned by the company_, and dependency was used to not depend on implementation interfaces. In short, adhering to the basics on the SOLID principles.

In my experience, any third-party dependency will inevitably eventually be in-sourced. Third-party dependencies are technical debt - they let you make fast progress initially, but have recurring maintenance costs.

Wrapping the API lets you see, all in one place, the surface area of an external API that is in use, and minimizes the changes required here when reimplementing it.

---

Also, not that Clean Code is any great authority (but it is popular!):

> [Wrapping] third-party APIs is a best practice. When you wrap a third-party API, you minimize your dependencies upon it: You can choose to move to a different library in the future without much penalty. Wrapping also makes it easier to mock out third-party calls when you are testing your own code.

> One ﬁnal advantage of wrapping is that you aren’t tied to a particular vendor’s API design choices. You can deﬁne an API that you feel comfortable with.

（评论） (comments)

（评论）
(comments)