(评论)
(comments)

原始链接: https://news.ycombinator.com/item?id=39356920

为了进一步详细说明 Antithesis 如何使用确定性模拟来捕获复杂的错误,我们假设我们有一个由三个副本组成的分布式系统,每个副本都有自己的数据库副本和网络连接。 以下是其在常规操作期间的典型工作方式: 步骤1:客户端向服务器A发送请求,服务器A在本地执行该请求,并通过网络将其传播到其邻居服务器B。 步骤 2:对网络中的每个节点重复步骤 1,从而导致所有副本的更新。 在此通信阶段,可能会发生导致状态不一致的各种事件。 网络延迟可能导致客户端命令无序执行,时钟偏差可能导致副本之间的时间线不同,并且复杂环境中可能会出现其他异常行为。 为了识别和隔离这些错误,Antithesis 创建了该场景的数千个虚拟副本,包括初始状态或底层网络消息传递时间顺序的变化。 通过再现这种场景的潜在无限数量的随机执行,Antithesis 试图暴露和再现任何潜在的争用源,以揭示可能出现的任何和所有错误。 通过在虚拟机管理程序提供的安全沙箱内重现这些错误和/或不完整实现的行为,开发和运营团队可以详细了解逻辑在极端极端情况下的行为方式。 如果没有确定性地模拟数百万个调用路径,这种性质的错误可能会在系统的整个生命周期中保持隐藏状态。 借助此功能,像 FoundationDB 这样的分布式数据库供应商可以利用这些技术让客户确信他们的架构不会表现出由这些类型的错误导致的异常和不良行为。 正如 Will Dewaugh 所指出的,与 Antithesis 相关的一个关键挑战与同时操作数十亿个独立虚拟以执行数十亿个场景并验证数万亿个执行结果所需的硬件和计算资源量有关。 然而,尽管面临这一挑战,确定性地模拟复杂分布式系统数百万次迭代的能力允许深入反思其基本特征,从而为这些系统提供前所未有的洞察力,并带来增强安全性、弹性和运营效率的机会。 这些功能为数据库管理员和设计人员提供了新颖的工具,使他们能够

相关文章

原文
Hacker News new | past | comments | ask | show | jobs | submit login
Is something bugging you? (antithesis.com)
1013 points by wwilson 22 hours ago | hide | past | favorite | 367 comments










> The biggest effect was that it gave our tiny engineering team the productivity of a team 50x its size.

I feel like the idea of the legendary "10x" developer has been bastardized to just mean workers who work 15 hours a day 6.5 days a week to get something out the door until they burn out.

But here's your real 10x (or 50x) productivity. People who implement something very few people even considered or understood to be possible, which then gives amazing leverage to deliver working software in a fraction of the time.



It seems like the industry would get a lot more 10x behavior if it was recognized and rewarded more often than it currently does. Too often, management will focus more on the guy who works 12 hour days to accomplish 8 hours of real work than the guy who gets the same thing accomplished in an 8 hour day. Also, deviations from 'normal' are frowned upon. Taking time to improve the process isn't built into the schedule; so taking time to build a wheelbarrow is discouraged when they think you could be hauling buckets faster instead.


>It seems like the industry would get a lot more 10x behavior if it was recognized and rewarded more often than it currently does

I'd be happier if industry cares more for team productivity - I have witnessed how rewarding "10x" individuals may lead to perverse results on a wider scale, a la Cobra Effect. In one insidious case, our management-enabled, long-tenured "10x" rockstar fixed all the big customer-facing bugs quickly, but would create multiple smaller bugs and regressions for the 1x developers to fix while he moved to the next big problem worthy of his attention. Everyone else ended up being 0.7x - which made the curse of an engineer look even more productive comparatively!

Because he was allowed to break the rules, there was a growing portion of the codebase that only he could work on - while it wasn't Rust, imagine an org has a "No Unsafe Rust" rule that is optional to 1 guy. Organizations ought to be very careful how they measure productivity, and should certainly look beyond first-order metrics.



> In one insidious case, our management-enabled, long-tenured "10x" rockstar fixed all the big customer-facing bugs quickly, but would create multiple smaller bugs and regressions for the 1x developers to fix while he moved to the next big problem worthy of his attention. Everyone else ended up being 0.7x - which made the curse of an engineer look even more productive comparatively! Because he was allowed to break the rules,

bingo, well said. Worked on a team like this with a “principal” engineer who’d work very fast with bug-ridden work like this simply because he had the automatic blessing from on high to do whatever he wanted. My unfortunate task was to run along behind him and clean up, which to my credit I think I did a pretty good job at, but of course these types can only very rarely acknowledge/appreciate that.

Eventually he got super insecure/threatened and attempted to push me out along with whoever else he felt was a threat to his fiefdom.



What happened to him in the end?


I try to look at these things through the lens of “software literacy” - software is a form of literacy and this story might be better viewed as “a bunch of illiterate managers are impressed with one good writer at the encyclopdia publishers, now it turns out this guy makes mistakes, but hey, what do you expect when the management cannot read or write !”


This reminds me of the "Parable of the Two Programmers." [1] A story about what happens to a brilliant developer given an identical task to a mediocre developer.

[1] I preserved a copy of it on my (no-advertising or monetization) blog here: https://realmensch.org/2017/08/25/the-parable-of-the-two-pro...



I can't seem to find it in a google search, maybe I'm just recalling entirely the wrong terms.

In the early computing era there was a competition. Something like take some input and produce an output. One programmer made a large program in (IIRC) Fortran with complex specifications documentation etc. The other used shell pipes, sort, and a small handful or two of other programs in a pipeline to accomplish the same task in like 10 developer min.



The Knuth link in the sibling comment is an original, but you're probably thinking of "The Tao of Programming"

http://catb.org/~esr/writings/unix-koans/ten-thousand.html

"""“And who better understands the Unix-nature?” Master Foo asked. “Is it he who writes the ten thousand lines, or he who, perceiving the emptiness of the task, gains merit by not coding?”"""



Sounds like "Knuth vs McIlroy", which has been discussed on hn and elsewhere before, and the general take is that it was somewhat unfair to Knuth.

[1] https://homepages.cwi.nl/~storm/teaching/reader/BentleyEtAl8... [2] https://www.google.com/search?q=knuth+vs+mcilroy



This is the competition I was thinking of. I must have read it in a dead-image PDF version some other time on HN. This paper isn't the one I recall but the solution is exactly the sort I vaguely recalled.

I'm trying to copy-in the program as it might have existed, with some obvious updates to work in today's shells ...

  #!/bin/sh
  tr -cs A-Za-z '
  ' "${2:-/dev/stdin}" |\
  tr A-Z a-z |\
  sort |\
  uniq -c |\
  sort -rn |\
  sed ${1:-100}q
Alternately (escapes not yet tested) $ tr -cs A-Za-z \012 "${INPUTFILEHERE:-/dev/stdin}" | tr A-Z a-z | sort | uniq -c | sort -rn | sed ${MAXWORDSHERE:-100}q

Edited: Removed some errors likely induced by OCR / me not catching that in the initial transcription from the browser view of the file.



Just to be clear, it was not a competition. For more, please follow the links from some of the previous HN discussions, e.g. https://news.ycombinator.com/item?id=31301777.

[For those who may not follow all the links: Bentley asked Knuth to write a program in Pascal (WEB) to illustrate literate programming—i.e. explaining a long complicated program—and so Knuth wrote a beautiful program with a custom data structure (hash-packed tries). Bentley then asked McIlroy to review the program. In the second half of the review, McIlroy (the inventor of Unix pipes) questioned the problem itself (the idea of writing a program for scratch), and used the opportunity to evangelize Unix and Unix pipes (at the time not widely known or available).]



There was also the "Hadoop vs. unix pipeline running on a laptop"-story a few years back, a more modern take: https://adamdrake.com/command-line-tools-can-be-235x-faster-...




I was both of those developers at different times, at least metaphorically.

I drank from the OO koolaid at one point. I was really into building things up using OOD and creating extensible, flexible code to accomplish everything.

And when I showed some code I'd written to my brother, he (rightly) scoffed and said that should have been 2-3 lines of shell script.

And I was enlightened. ;)

Like, I seriously rebuilt my programming philosophy practically from the ground up after that one comment. It's cool having a really smart brother, even if he's younger than me. :)



I had an idea once but when I tried to explain it people didn't understand.

I revisited earlier thought: communication is a 2 man job, one is to not make an effort to understand while the other explains things poorly. It always manages to never work out.

Periodically I thought about the puzzle and was eventually able to explain it such that people thought it was brilliant ~ tho much to complex to execute.

I thought about it some more, years went by and I eventually managed to make it easy to understand. The response: "If it was that simple someone else would have thought of it." I still find it hilarious decades later.

It pops to mind often when I rewrite some code and it goes from almost unreadable to something simple and elegant. Ah, this must be how someone else would have done it!



> Ah, this must be how someone else would have done it!

This is a good exclamation :D

And it's a poignant story. Thanks for sharing.



That’s pretty good. It needs an Athena poster :-)


“Give me six hours to chop down a tree and I will spend the first four sharpening the axe.”

― Abraham Lincoln

I have started to follow this 'lately' (for a decade) and it has worked miracles. As for the anxious managers/clients, I keep them updated of the design/documentation/though process, mentioning the risks of the path-not-taken, and that maintain their peace of mind. But this depends heavily on the client and the managers.



Without more backup I can only describe that as being fiction. Righteous fiction, where the good guy gets downtrodden and the bad guy wins to fuel the reader's resentment.


It's practically my life experience.

Sometimes I'm appreciated, and managers actually realize what they have when I create something for them. Frequently I accomplish borderline miracles and a manager will look at me and say, "OK, what about this other thing?"

My first job out of college, I was working for a company run by a guy who said to me, "Programmers are a dime a dozen."

He also said to me, after I quit, after his client refused to give him any more work unless he guaranteed that I was the lead developer on it, "I can't believe you quit." I simply shrugged and thought, "Maybe you shouldn't have treated me like crap, including not even matching the other offer I got."

I've also made quite a lot of money "Rescuing Small Companies From Code Disasters. (TM)" ;) Yes, that's my catch phrase. So I've seen the messes that teams often create.

The "incompetent" team code description in the story is practically prescient. I've seen the results of exactly that kind of management and team a dozen times. Things that, given the same project description, I could have created in 1/100 the code and with much more overall flexibility. I've literally thrown out entire projects like that and replaced them with the much smaller, tighter, and faster code that does more than the original project.

So all I can say is: Find better teams to work with if you think this is fiction. This resonates with me because it contains industry Truth.



To me it is a story about managers clueless about the work. You can make all the effort in the world to imagine doing something but the taste of the soup is in the eating. I do very simple physical grunt work for a living, there it is much more obvious that it is impossible. It's truly hilarious.

They probably deserve more praise when they do guess correctly but would anyone really know when it happens?



That’s because most executives can’t understand technology deeply enough to know the difference.


Even when they are smart enough to know, they seem to have very short memories. While I don't consider myself to be a 10x engineer; I have certainly done a number of 10x things over my career.

I worked for a company where I almost single handedly built a product that resulted in tens of millions of dollars in sales. I got a nice 'atta boy' for it, but my future ideas were often overridden by someone in management who 'knew better'. After the management changed, I found myself in a downsizing event once I started criticizing them for a lack of innovation.



This is the sad part of it, many people without core competence end up in "leadership" positions and remove any "perceived" threats to their authority. I believe part of it is due to the absence of leadership training in the engineering curriculum. Colleges should encourage engineers to take up few leadership courses and get them trained on things like Influence and Power.


Reminds me of the inventer of the blue LED (see recent veritasium video)


It's almost impossible to get executives to think in return on equity (“RoE”) for the future instead of “costs” measured in dollars and cents last quarter.

Which is weird, since so many executives are working in a VC-funded environment, and internal work should be “venture funded” as well.



> It seems like the industry would get a lot more 10x behavior if it was recognized and rewarded more often than it currently does.

I don't agree with that, there are a _lot_ of completely crap developers and they get put into positions where even the ones capable of doing so aren't allowed to because it's not on a ticket.

I've seen some thing.



Honestly? You work at a place a manager hasn't heard "impact" yet? I thought managers at this point just walk around the office saying "impact".


When I was in college, I've met a few people that coded _a lot_ faster than me. Typically, they started since they were 12 instead of 21 (like me). That's how 10x engineers exist, by the time they are 30, they have roughly 20 years of programming experience behind their belt instead of 10.

Also, their professional experience is much greater. Sure, their initial jobs at 15 are the occassional weird gig for the uncle/aunt or cousin/nephew but they get picked up by professional firms at 18 and do a job next to their CS studies.

At least, that's how it used to be. Not sure if this is still happening due to the new job environment, but this was the reality from around 2004 to 2018.

For 10x engineers to exist, all it takes is a few examples. To me, everyone is in agreement that they seem to be rare. I point to a public 10x engineer. He'd never say it himself, but my guess is that this person is a 10x engineer [1].

If you disagree, I'm curious how you'd disagree. I'm just a blind man touching a part of the elephant [2]. I do not claim to see the whole picture.

[1] https://bellard.org/ (the person who created JSLinux)

[2] https://en.wikipedia.org/wiki/Blind_men_and_an_elephant - if you don't know the parable, it's a fun one!



The real problem is the measurement: speed of coding (or doing any other job) or volume of worl done. Those two are actually really bad productivity measures.


Yup, that's been my experience as someone who asked for a C++ compiler for my 12th birthday, worked on a bunch of random websites and webapps for friends of the family, and spent some time at age 16-17 running a Beowulf cluster and attempting to help postdocs port their code to run on MPI (with mixed success). All thru my CS education I was writing tons of toy programs, contributing (as much as I could) toward OSS, reading lots of stuff on best practices, and leaning on my much older (12 years) brother who was working in the industry. He pointed me to Java and IntelliJ, told me to read Design Patterns (Gang of Four) and Refactoring (Fowler). I read Joel on Software religiously, even though he was a Microsoft guy and I was a hardcore Linux-head.

By the time I joined my first real company at age 21, I was ready to start putting a lot of this stuff into place. I joined a small med device software company which had a great product but really no strong software engineering culture: zero unit tests, using CVS with no branches, release builds were done manually on the COO's workstation, etc.

As literally the most junior person in the company I worked through all these things and convinced my much more senior colleagues that we should start using release branches instead of "hey everybody, please don't check in any new code until we get this release out the door". I wrote automated build scripts mostly for my own benefit, until the COO realized that he didn't have to worry about keeping a dev environment on his machine, now that he didn't code any more. I wrote a junit-inspired unit testing framework for the language we were using (https://en.wikipedia.org/wiki/IDL_(programming_language) - like Matlab but weirder).

Without my work as a "10x junior engineer", the company would have been unable to scale to more than 3 or 4 developers. I got involved in hiring and made sure we were hiring people who were on board with writing tests. We finally turned into a "real" software company 2 or 3 years after I joined.



This sounds similar to the best programmer I personally know and he was an intern working at LLVM at the time. It's funny how companies treat that part of his life as "no experience". Then suddenly he goes into the HFT space and within a couple of years he has a similar rank that people have that are twice his age.

10x engineers exist. To be fair, it does depend which software engineer you see as "the standard software engineer", but if I take myself as a standard (as an employed software engineer with 5 years of experience), then 10x software engineers exist.



I'm not even sure that coding _much_ faster than necessary is even required to give a 3-5x multiple on "average", let alone "worst case" developers. Some of the biggest productivity wins can be had by being able to look at requirements, knowing what's right or wrong about them, and getting everyone on the same page so the thing only needs to be made once. Being good at test and debug so problems are identified and fixed _early_ are also big wins. Lots of that is just having the experience to recognize what sort of problem you're dealing with very quickly.

Being a programming prodigy is nice, but I don't think you even really need that.



All of the things you list are the product of the experience that OP is talking about. Anyone can get there with 20 years of (sufficiently rigorous) experience by ~40, but people who start as a child have a head start and it does show.

It's probably especially obvious in the child-prodigy types because we as an industry have a tendency to force people out of IC roles by 40, so the child prodigies are the only ones who have enough time to develop 20 years of experience working directly with code.



It's not just that people get pushed out of IC work, but everyone tends to have less energy as they get older + other life demands accumulate.

The combination of 10-15 years of experience and the energy/time of 20s is very powerful.



the other factor I noticed from the days I was programming for fun vs these days where I'm programming for pay.

in those early years the tasks that you take on are probably above your skillset and fight through it, you're not accountable to anyone.

in a job you're usually hired for what you already know, +/- some margin for more gradual learning , not really that much room for moonshots. the work need to be divided into bit sized parts that you can justify to the higher-ups when needed. you have less room for exploring really non linear paths toward the solution which can be harder to explain but where you learn more.

so in the end this end up sometimes amounting to 10years of experience outside work being more impactful than 10 years at work. ,



Underrated comment


Last year, we had 2 new hires.. one is fresh out of college (and not one of the top ones), other with 15 years experience on resume in our industry.

I am not sure there is 10x difference, but there is at least 5x difference in performance, in favor of fresh college grad, and they are now working on the more complex tasks too.

The sad part is our hiring is still heavily in "senior engineer with lots of experience" phase, and intership program has been canceled.



I also had a lot of luck with interns.

At my previous company I had 2 first-job juniors and 4 or 5 interns that were outstanding.

(however the last one was totally terrible and totally killed my hiring credibility hah, but it was a 80% success rate still)

I find that there are too many pretenders, though. I get way too many people with 15 years of experience and ChatGPT resumes that just can't code at all :/



have hired five interns and five experienced developers. Some of the interns exceeded expectations, while some of the experienced developers also performed well.

The top-performing experienced developers outshone all the graduates. However, the less effective experienced developers were on par with the graduates, showing no significant difference in performance.

The takeaway for me is that simple anecdotes are not very informative. Over time and with a larger sample size, experienced individuals tend to perform better. Nonetheless, some graduates will also become exceptionally skilled.

Graduates are more cost-effective. Experienced professionals require less oversight. If they need substantial guidance, they don't truly qualify as senior, as opposed to their resume that says, at 25 they have been a CTO for 10 years).



I’m not complaining about the quality of seniors I hire or comparing them to interns.


I am not convinced that just starting early is all there is to it. I started Math, Sports, and Piano at like 6 years old but there are still plenty of "10x " people that figuratively and literally run circles around me. Talent is a real thing.


The intensity you did it though matters. You probably didn't spend that many years on a specific sport for instance.

And when we're talking about sports, genetics matter as well (depending on each one)

When we're talking brains, while genetics also matter, assuming normal (whatever that is) brain, the plasticity changes a lot how it operates.

So, the 10 years thing is definitely a big if not the biggest part. In my opinion. Would love to see studies if any exist out there on this



I did spend years on a specific sport starting as a kid. I was average. There were people that first played the sport as teenagers and within a year were competitive nationally.

I was in the same math classes as some of my peers for a decade+. Some people were great, some were bad and most were somewhere in between. The kids who were exceptional at 9 were exceptional at 17.

Obviously time matters but genetics play a huge role as well. I have a family friend with 2 adopted kids and 2 biological kids. The adopted ones are average but the biological ones are very smart. Just like their parents.



It's possible there are plenty of individuals 10x better than you while you are 10x better than most, due to early exposure. I wouldn't say this of sports and math necessarily, but I definitely would say it of your example of piano, language acquisition, and I would not be surprised if programming patterned with them, at least partially.


That may be true of individual activities, but you trained in multiple. A fairer comparison would require the same people who best you in athletics to at least be comparable at math etc.


Some people organize their time and focus their efforts more efficiently than others. They also use tools that others might not even know or careabout.

You probably surf the internet 10x faster than your parents. Yes you've probably had more exposure than them, but you could probably teach them how to do it just as fast. But would they want to learn and would they actually adapt what you taught them?



With motivation, repetition, and those depend on how plastic your brain is, thus the age, yes!


Nick with Antithesis here with a funny story on this.

I became friends with Dave our CTO when I was 5 or 6, we were neighbors. He'd already started coding little games in Basic (this was 1985). Later in our friendship, like when I was maybe 10, I asked him if he could help me learn to code, which he did. After a week or two I had made some progress but compared what I could do to what he was doing and figured "I guess I just started too late, what's the point?".

I found out later that most people didn't start coding till late HS or college! It worked out though - I'm programmer adjacent and have taken care of the business side of our projects through the years :)



Yes: Programmers who start at twelve are often the 10x programmers who can really program faster than the average developer by a lot.

No: It's not because they have 10 more years of experience. Read "The Mythical Man Month." That's the book that popularized the concept that some developers were 5-25x faster than others. One of the takeaways was that the speed of a developer was not correlated with experience. At all.

That said, the kind of person who can learn programming at 12 might just be the kind of person who is really good at programming.

I started learning programming concepts at 11-12. I'm not the best programmer I know, but when I started out in the industry at 22 I was working with developers with 10+ years of (real) experience on me...and I was able to come in and improve on their code to an extreme degree. I was completing my projects faster than other senior developers. With less than two years of experience in the industry I was promoted to "senior" developer and put on a project as lead (and sole) developer and my project was the only one to be completed on time, and with no defects. (This is video game industry, so it wasn't exactly a super-simple project; at the time this meant games written 100% in assembly language with all kinds of memory and performance constraints, and a single bug meant Nintendo would reject the image and make you fix the problem. We got our cartridge approved the first time through.)

Some programmers are just faster and more intuitive with programming than others. This shouldn't be a surprise. Some writers are better and faster than others. Some artists are better and faster than others. Some architects are better and faster than others. Some product designers are better and faster than others. It's not all about the number of hours of practice in any of these cases; yes, the best in a field often practices an insane amount. But the very top in each field, despite having similar numbers of hours of practice and experience, can vary in skill by an insane amount. Even some of the best in each field are vastly different in speed: You can have an artist who takes years to paint a single painting, and another who does several per week, but of similar ultimate quality. Humans have different aptitudes. This shouldn't even be controversial.

I do wonder if the "learned programming at 12" has anything to do with it: Most people will only ever be able to speak a language as fluently as a native speaker if they learn it before they're about 13-14 years old. After that the brain (again, for most people; this isn't universal) apparently becomes less flexible. In MRI studies they can actually detect differences between the parts of the brain used to learn a foreign language as an adult vs. as a tween or early teen. So there's a chance that early exposure to the right concepts actually reshapes the brain. But that's just conjecture mixed with my intuition of the situation: When I observe "normal" developers program, it really feels like I'm a native speaker and they're trying to convert between an alien way of thinking about a problem into a foreign language they're not that familiar with.

AND...there may not be a need to explicitly PROGRAM before you're 15 to be good at it as an adult. There are video games that exercise similar brain regions that could substitute for actual programming experience. AND I may be 100% wrong. Would be good for someone to fund some studies.



That childhood native-fluency analogy is insightful! Your experience matches mine.

I started programming at age 7 and it's true that the way code forms in my head feels similar to the way words form when I'm writing or speaking in English. In the same way that I don't stop and consciously figure out whether to use the past or present tense while I'm talking, I usually don't consciously think about, say, what kind of looping construct I'm about to use; it's just the natural-feeling way to express the idea I'm trying to convey. The idea itself is kind of already in the form of mental code in the same way that my thoughts are kind of already in English if I'm speaking.

But... maybe that's how it is for everyone, even people who learned later? I only know how it is in my own head.



The association with video games in your last paragraph makes a lot of sense to me. This is how I feel solving problems.

I always thought that people who start at 12 and keep at it are good because they really love it.I see people who struggle a lot with learning, and it's because they hate it but are doing it for other reasons.



Was it 50x productivity due to 10x engineers, or 50x productivity due to optimized company structure? (edit: obviously, these do not need to be mutually exclusive - it's a sum of all the different parts)

It's easy to bog down even the best Nx engineers if you keep them occupied with endless bullshit tasks, meetings, (ever) changing timelines, and all that.

Kind of like having a professional driver drive a sportscar through a racetrack, versus the streets of Boston.



I'm tired of hearing about 10x engineers. I just want to be a good 1x engineer. Or good at anything in life realy.


The truest 10x engineer I ever encountered was a memory firmware guy with ASIC experience who absolutely made sure to log off at 5 every day after really putting in the work. Go to guy for all parts of the codebase, even that which he didn't expressly touch.


> I'm tired of hearing about 10x engineers.

"The truest 10x engineer I ever encountered was..."



Once you have few years of experience, you don't need to be 10x to have success. You can be a reliable 1.3x, a little bit better then your teammates.

In the end it doesn't matter, whole team could be laid off at once.



Spend less time on HN and you might get more done.


Do you want to read hacker news or be hacker news?


Or stay.

It's not about the hours.



I think getting something worthwhile done is a better focus (actually quite hard!), and naturally increases your productivity as a side-effect.

Productivity has no inherent value - like efficiency and perfection, it is necessarily of something else. Its value is entirely derived.



The “10x engineer” comes from the observation that there is a 10x difference in productivity between the best and the worst engineers. By saying that you want to be a 1x engineer, you’re saying you want to be the least productive engineer possible. 1x is not the average, 1x is the worst.


I'm not sure your math works.

What we do know is that the worst engineers provide negative productivity. If 1x is the worst engineer, then let's for the sake of discussion denote x as -1 in order for the product to be negative. Except that means the 10x engineer provides -10 productivity, actually making them the worst engineer. Therein lies a conflict.

What we also know is that best engineer has positive productivity, so that means the multiplicand must always be positive. Which means that it is the multiplier that must go negative, meaning that a -1x and maybe even a -10x engineer exists.



Thank you. This sounds so trivial at first, but your reductio ad absurdum at the beginning of your comment really nails it.

Throw into the mix the fact that productivity is hard to measure as soon as more than one person works on something and that doesn't even begin to consider the economical aspects of software.

And even when ignoring this point, there's that pesky short-term vs long-term thing.

Also, how do you define the term "productivity"? I was assuming that you mean somethint along the lines of (indirect, if employed) monetary output.



You are arguing against the idea that there is a factor of ten difference in productivity between the best and the worst engineers. That’s fine if you want to do that, but that’s explicitly where the term “10x engineer” comes from and what defines its meaning. So if you disagree with the underlying concept, there is no way for you to use terms like “[n]x engineer” coherently since you disagree with its most fundamental premise. You certainly shouldn’t reinvent different meanings for these terms.


You're not wrong, but I think you may be treating something as literal math, when it is in fact idiomatic labels used to express trends.


The problem here is the introduction of productivity.

The 10x developer originated from a study that measured performance. The 10x developer being able to do a task in a 10th of the time is quite conceivable and reflects what the study found. I'm sure we've all seen a developer take 10 hours to do a job that would take another developer just 1 hour. Nobody is doing it in negative hours, so the math works.

But performance is not the same as productivity.



Measuring productivity like that in technology makes no sense because our work is not fungible; what and how we do it matters as much as how fast we do it. Time-based productivity measurement is for factory workers stamping out widgets. So in our revenue-based world, negative productivity makes sense.


> productivity

Performance. That is what the study that found a 10x performance difference observed. There is no mention of productivity in the study. If anyone has tried to study productivity, they most certainly have not come up with a 10x moniker. It seems productivity was mentioned in this thread only because it also happens to start with the letter 'p' and someone got confused.



Engineers with negative productivity are vanishingly rare, soon to be terminated, and reasonable to exclude for the purpose of the comparison.


Okay, but it still doesn't work. The world's worst engineer who somehow managed to successfully contribute one line of code to something like GPT is way more productive than a great engineer who designed from top to bottom the best laid software ever conceived but was thrown away before seeing the light of day because the business changed direction.

Of course, that doesn't actually matter as the original study found a 10x difference in measuring performance, not productivity. There is nothing out there to suggest that some developers are 10x more productive outside of those who mixed up their p words. We're not actually talking about productivity; that was just a mistake. If one were to study productivity, I expect they would find that some engineers are many orders of magnitude more productive than the least productive engineers.



> The world's worst engineer who somehow managed to successfully contribute one line of code to something like GPT is way more productive than a great engineer who designed from top to bottom the best laid software ever conceived but was thrown away before seeing the light of day because the business changed direction.

I reject that assertion.

> performance, not productivity

You keep saying this like it's a slam dunk refutation, but performance and productivity are highly related.



> I reject that assertion.

Because you don't believe the worst developer contributed anything to GPT? Sure, in reality that's no doubt true, but it was only ever meant to be illustrative.

> but performance and productivity are highly related.

Not in any meaningful way. The study found that the fastest developer can perform a set of defined tasks in a 10th of the time of the slowest developer. That is what the 10x developer refers to. But being fast doesn't mean being productive.

Come to my backyard and we will each dig a hole of equal size. Let's assume that you can dig the hole in a 10th of the time I can – that you are the 10x hole digger. But, no matter how fast you are, neither of us will be productive.



the worst engineer certainly has negative productivity, so I'm not sure that your explanation can possibly be the correct one.


I’m explaining what the terms “10x” and “1x” mean, not asserting that the original observation is correct under all circumstances.


Except you haven't explained it at all. Sackman, Erickson, and Grant found that some developers were able to complete what was effectively a programming contest in a 10th of the time of the slowest participants. This is the origin of the 10x developer idea.

You, on the other hand, are claiming that 10x engineers are 10 times more productive than the worst engineers. Completing a programming challenge in a 10th of the time is not the same as being 10 times more productive, and obviously your usage can't be an explanation, even as one you made up on the spot, as the math doesn't add up.



That was designed as a repeatable experiment, which seems entirely reasonable when you want to conduct a study. Why are you characterising that as “a programming contest”? That seems like an uncharitably distorted way of describing a study.

That study also does not exist in isolation:

https://www.construx.com/blog/the-origins-of-10x-how-valid-i...



> Why are you characterising that as “a programming contest”?

Because it was? Do you have a better way to repeatedly test performance? And yes, the study's intent was to look at performance, not productivity. It's even right in the title. Not sure where you dreamed up the latter.



i believe the original was for an entire "organizations" performance, and was also done in 1977. Since they are averages, It makes "sense" to conclude that the best of a good team is 10x better than the average of the worst team. Not really what the experimwnt concludes but what can you do.


The first was 1968, but there have been more studies since.

https://www.construx.com/blog/the-origins-of-10x-how-valid-i...



Hmm, I never thought of it that way. I just heard 10x employees and fit it to what I knew. Which is that 90% of the work is accomplished by about 10% of workers. The other 90% really only get 10% done. So most developers are somewhere on a scale of 0.1 - 1. With 1 being a totally competent and good developer. The 10x people are just different though, it's like a pro-athlete to a regular player. It's not unique to software development, though it may stand out and be sought after more. I've noticed it in pretty much every industry. Some people are just able to achieve flow state in their work and be vastly more productive than others, be it writing code or laying sod. I don't find that there's a lot of in between 1 and 10 though.


Even if this was the origin of the term, it still doesn't make sense because the best engineers can solve problems the worst would never be able to do so. The difference between the best and worst is much more than 10x the worst. Maybe the worst who meets certain minimums at a company, but then the best would also be limited by those willing to work for what the company pays, and I hypothesis that the minimums of the lower bound and the maximums of the upper bound are correlated.


It sounds like you disagree with the concept of a 10x engineer then. In which case you should avoid using the term, rather than making up a new definition.


Concepts and words change meaning and sometimes we all need to accept that the popular meaning is not the definition we use.

This is especially common when dealing with historical or academic definitions versus common modern usage. "Evolution" particularly annoys me.

You should avoid using the term, rather than using a definition at odds with common usage. Your usage is confusing - and that is why you are getting push-back.

The definition you have given is nonsensical - it can't be consistent over time or between companies because it depends on finding a minimum in a group. And a value that is strongly dependent on the worst developer is useless because it mostly measures how bad the worst developer is - it doesn't say anything about how good the best developer is.



It depends on the day if I feel like a 2x or a 0.1x engineer. Keep at it. You are not alone!


Do 10x engineers get 10x the wages? Somehow I feel being exceptionally better than other engineers is just unfair to both of you and the ones worse than you. I wouldn't want to be a 10x either, I'd rather just be normal engineer.


Meta compensates 10x types very well. 3x bonus multipliers, additional equity that can range from 100k-1m+, and level increases are a huge bump to comp (https://www.levels.fyi/)


Meta compensates all SWEs very well. To suppose arguendo that 10x types exist, I don't think they're really compensated linearly 10x more than everyone else. But yeah, certainly, if you are great at your job and want to make a bunch of money, Meta is a great employer for that.

3x bonus multiplies (Redefines Expectations) are extremely uncommon. Level increases certainly help but like, L7 only makes ~3x what L5 does -- not 10x. And there are few L7s and very few L8+.



I have many meta colleagues I've worked with in the past. All of them are well compensated but none of them were outstanding, or 10x.


You took the words right out of my mouth


Your definition is also vague. Someone still needs to do the legwork. One man armies who can do everything themselves don't really fit in standardized teams where everything is compartmentalized and work divided and spread out.

They work best on their own projects with nobody else in their way, no colleagues, no managers, but that's not most jobs. Once you're part of a team, you can't do too much work yourself no matter how good you are, as inevitably the other slower/weaker team members will slow you down as you'll fight dealing with the issues they introduce into the project or the issues from management, so every team moves at the speed of the lowest common denominator no matter their rockstars.



That rings true and is probably why the 10x engineers I have seen usually work on devops or modify the framework the other devs are using in some way. For example, an engineer who speeds up a build or test suite by an order of magnitude is easily a 10x engineer in most organizations, in terms of man hours saved.


> For example, an engineer who speeds up a build or test suite by an order of magnitude is easily a 10x engineer in most organizations, in terms of man hours saved.

Yeah but this isn't something scalable that can happen regularly as part of your job description. Like most jobs/companies don't have so many low hanging fruits to pick that someone can speed of build by orders of magnitude on a weekly basis. It's usually a one time thing. And one time things don't usually make you a 10x dev. Maybe you just got lucky once to see something others missed.

And often times at big places most people know where the low hanging fruits are and can fix them, but management, release schedules and tech debt are perpetually in the way.

IMHO what makes you a 10x dev is you always know how to unblock people no matter the issue so that the project is constantly smooth saling, not chasing orders of magnitude improvements unicorns.



Does anyone else feel like people follow these sort of industry pop-culture terms a bit too intensely? What I mean is that the existence of the term tends to bring out people trying to figure who that might be, as if it has to be 100% true.

I personally think that some people can provide “10x” (arbitrary) the value on occasion, like the low hanging fruit you said. I also believe some people are slightly more skilled than others, and get more results out of their work. That said, there are so many ways for somebody to have an impact that doesn’t have to immediate, that I find the term itself too prevalent.



"Does anyone else feel like people follow these sort of industry pop-culture terms a bit too intensely? "

Agreed, there is too much effort going into the "superstars" theme, but there are definitely people who get 10x done in the same time as others.



Yep. No matter what you're doing, some people are more productive than others. Often it's a matter of experience and practice, sometimes ability to focus, sometimes motivation, rarely it's a lack or surplus of inherent ability. Using people effectively in the context of a team all depends on the skill of the manager though.


I think a lot of people that complain about 10x chattery in HN should take some kind of carpentry etc course or some other kind of handiwork like that with a real master.

Some of those people not only get things done much quicker, but they also get it done with better quality than an amateur, with less mistakes, throwing away less material, sometimes with more safety.

This is definitely more than 10x better. And there are some real hacks doing those kinds of jobs. I find programming to be not different than that.



Well sure, if you compare a master to a novice, there is almost always a great difference. But between masters of carpentry, there is usually not so much difference. But here with the 10x trope it is supposed to be different and I would say indeed, but it is not as common as many would like to think.


Perhaps there aren’t that many non-master carpenters (I don’t think that’s true, there’s plenty of professional incompetents), but I am 100% sure that not all professional developers are “masters”.


It really does depend on where you work. The order of magnitude improvements I'm describing involved interdisciplinary expertise involving both bespoke distributed build systems and assembly language. They're not unicorns, they do exist, but they are very rare and most engineers just aren't going to be able to find them, even with infinite time. Hence why a 10x engineer is so valuable and not everyone can be one. I myself am certainly not one, in most contexts.


> Like most jobs/companies don't have so many low hanging fruits to pick that someone can speed of build by orders of magnitude on a weekly basis

You and I have worked at very different organizations. Everywhere I've been has had insane levels of inefficiency in literally every process.



>insane levels of inefficiency in literally every process.

In processes yes, not in code, and solo 10x devs alone can't fix broken processes as those are a the effect of broken management and engineering culture.

People know where the inefficiencies are, but management doesn't care.



same here - it is especially bad in huge companies, the inefficiencies and waste are legendary.


Nothing wrong with "one man armies" in the team context. There is a long list of tasks that needs to be done.. over same time period, one person will do 5 complex tasks (with tests and documentation), while the other will do just 1 task, and then spend even more time redoing it properly.

Over time this produces funny effects, like super-big 20 point task done in few days because wrong person started working on it.



On my team, one of the main multipliers is understanding the need behind the requested implementation, and proposing alternative solutions - minimizing or avoiding code changes altogether. It helps that we work on internal tooling and are very close to the process and stakeholders.

"Hmmm, there's another way to accomplish this" being the 10x. Doing things faster is not it.



Exactly this. It’s why it’s so frustrating when product managers who think they’re above giving background run the show (the ones who think they’re your manager and are therefore too important to share that with you)


I've always thought a x10 is one who sits back and sees a simpler way - like some math problems have an easy solution, if you can see it. Also: change the question; change the context (Alan Kay)

(And absolutely not brute-force grinding themselves away)



Agreed. You can brute force, but not for long.




Literally the opposite of what makes a car go fast :-)


Is it?

How fast would you drive a car if I gave you the keys and told you „everything works perfectly fine, however the brakes have been removed“?



This is a perceptive observation. In my experience, so called "10x" engineers are as productive as they are because they have a process by which they practice the development on software that anticipates future problems. As a result when they check something in, they spend very little time "debugging" or "fixing bugs" with code that does what they already need it to do.

It is always very useful as an engineer to log your time, what are you working on "right now" and is it "new work" , "maintenance work", or "fixing work." Then for each log entry that isn't "new work" thinking about what you could have done that would have caught that problem before it was committed to the code base.

I find it is much better to evaluate engineers based on how often they are solving the same problem that they had before vs creating new stuff. That ratio, for me, is the essence of the Nx engineer (for 0.1

The point that Wilson makes that having infrastructure/tools that push that ratio further from "repair" work to "new work" is hugely empowering to an organization.



In my experience it often comes down to business processes. We have a guy in my extended team who knows everything about his side of the company. When I work with him I accomplish business altering deliveries in a very short amount of time, which after a week or two rarely needs to be touched again unless something in the business changes. He’s not a PO and we don’t do anything too formally because it’s just him, me and another developer + whatever business manager will benefit from the development (and a few testers from their tran). In many ways the way we work these projects are very akin to Team Topologies.

At other times I’ll be assigned projects with regular POs, Architects and business employees who barely know what it is they are doing themselves, with poorly defined tasks and all sorts of bureaucratic nonsense “agile” process methods and well spend forever delivering nothing.

So sometimes I’m a 50x developer delivering business altering changes. At other times I’m a useless cog in a sea of pseudo workers. I don’t particularly care, I get paid, but if management actually knew what was going on, and how to change it… well…



How many organisations - of any kind, startups or enterprise or unicorns or whatever - will invest so much in effort that doesn't even touch the product. Before the product exists!

I think the reluctance to invest effort in something that will give devs super-powers 6 months in the future is why we don't get all those 10x devs.



> People who implement something very few people even considered or understood to be possible, which then gives amazing leverage to deliver working software in a fraction of the time.

I agree with the first part of your statement, but what really happens to such people?

In my experience (sample size greater than one), they receive some kudos, but remain underpaid, never promoted, and are given more work under tight deadlines. At least until some of them are laid off along with lower performers.

But for those who say that hard things are impossible, they seem to get along just fine. They merely declare such things as out-of-scope or lie about their roadmap.



> In my experience (sample size greater than one), they receive some kudos, but remain underpaid, never promoted, and are given more work under tight deadlines. At least until some of them are laid off along with lower performers.

100% agree, I've seen plenty of the best of the best get treated like trash and laid off at first sight of trouble on the horizon



Anyone can be a 10x engineer when they write something similar/identical to what they've written before. Other jobs are not like this. A plumber may only be 20% faster on the best days of their career.


No one reading this during the hours of 9-5 is a 10x.


Or is. If a 1x puts in an 8 hour day, a 10x only has to put in a 48 minute day. That leaves plenty of time to read this.


His point is that smart and productive people are generally hard working, focused and diligent, which is how they get to be so experienced and productive.

Hence not wasting time on social networks.

> a 10x only has to put in a 48 minute day

Nobody would call this person "10x".



> His point is that smart and productive people are generally hard working, focused and diligent

I don't think that tracks. Smart, productive, hard working people don't work 9-5. They work every hour they can, breaking only when they have pushed themselves to the limit. The limit can be hit at any hour. There is no magical property of the universe that gives people unlimited stamina during the hours of 9-5.

> Nobody would call this person "10x".

I'm not sure they would call anyone that, to be fair. A "10x developer" who also puts in 8 hours alongside the 1x developers isn't a 10x developer, he would be called a sucker.



Hackernews is hardly a waste of time though. 10x is probably curious of topics mentioned on Hackernews.


That’s a bad take because you’re assuming that developer is capable of replicating that * 10


That's entirely the fundamental flaw of the Nx developer ethos to a tee. No individual will benchmark reliably against any other person of their same trade/craft perfectly over time. The mythical BS times developer is so over simplified to be a meaningless concept. Hire "unicorn" and get amazing results just isn't a guarantee. They just probably have better chance than average to make a higher impact, which is good enough for companies that are willing to pay Nx times average salaries to acquire them.


I know it's meant to be funny, but the number of tech people who spend zero time learning about "what's out there", are usually not the most effective developers. You won't find better solutions to existing or even new problems without an interest in industry. Maybe this particular article isn't "industry valuable fair enough", but having zero interest in refining and enhancing your craft beyond the work in front of you is almost guaranteed to end with worse outcomes.


Hard agree.

Another flaw in his thinking: brain cycles and sub-conscious processing.

I'm in the middle of a hard problem right now. I ran out of ideas, and opened HN about half an hour ago. In that time, without "trying", I've had two new ideas - one sent me back to my notes, which revealed that my original thinking was flawed; the second sent me to documentation, which suggested a new route to pursue. I'm digesting the implications of that while I write this.

Beating my head against the problem directly for thirty minutes would have been less productive. (Though if I wasn't WFH I would have, and also been miserable, and learned less about the industry than I have from this thread. So there's that.)

I'm far from a 10x anything, but I don't have the only brain which works this way.



Yep. Often I find our most accelerative work is stuff that makes testing changes easy (a very simple to bootstrap staging environment) or creates a lot of guarantees (typescript).


10x developer is just a buzzword people throw around when they're trying to sell you something.


6.5 X 15 is only 97 hours per week not even close to the 400 hrs (5X40) per week of programming a 10X Rust programmer can provide. I jest but all this 10X stuff is getting ridiculous. They stayed in "Stealth" mode because they didn't have anything worth showing for 5 years. Doesn't sound all that productive to me. More likely what they are trying to do was hard and complicated and took a while to figure out.


They're not boasting about their current productivity, they're boasting about the one they achieved at FoundationDB when they implemented the testing, which gave them the idea to build antithesis


This might be the best introduction post I've read.

Lays the foundation (get it?) for who the people are and what they've built.

Then explains how the current thing they are building is a result of the previous thing. It feels that they actually want this problem solved for everyone because they have experienced how good the solution feels.

Then tells us about the teams (pretty big names with complex systems) that have already used it.

All of these wrapped in good writing that appeals to developers/founders. Landing page is great too!



It seems like marketing copy. Not a technical blog post.

It would be nice to see some actual use cases and examples.

Instead, the writer just name-dropped a few big companies and claimed to have a revolutionary product that works magically. Then include the typical buzzwords like '10x programmer' and 'stealth mode'. The latter doesn't make sense because they also name-drop clients.



It absolutely doesn’t read like typical marketing copy, and yes it’s not a dense technical blog post either. I’m sure the use cases and examples will come, but putting them in this post would have been overkill.

Also, stealth mode just means your company isn’t public, you can still have clients.



I'm assuming you aren't aware of FoundationDB: https://www.foundationdb.org/files/fdb-paper.pdf

Having that context puts the post in a much better perspective. It's definitely an introduction post (the company has been developing this in stealth mode for the past few years), but it is most certainly _not_ a marketing post. These people developed extremely novel testing techniques for FoundationDB and are now generalizing them to work with any containerized application.

It's a big deal.



The entire testing system they describe feels like something I can strive towards too. They make you want their solution because it offers a way of life and thinking and doing like you've never experienced before


Except it doesn't actually explain in what it does: Is it fuzzing? Do you supply your own test cases? Is it testing hardware non-determinism?


Post author here. Sorry it was vague, but there's only so much detail you can go into in a blog post aimed at general audiences. Our documentation (https://antithesis.com/docs/) has a lot more info.

Here's my attempt at a more complete answer: think of the story of the blind men and the elephant. There's a thing, called fuzzing, invented by security researchers. There's a thing, called property-based testing, invented by functional programmers. There's a thing, called network simulation, invented by distributed systems people. There's a thing, called rare-event simulation, invented by physicists (!). But if you squint, all of these things are really the same kind of thing, which we call "autonomous testing". It's where you express high-level properties of your system, and have the computer do the grunt work to see if they're true. Antithesis is our attempt to take the best ideas from each of these fields, and turn them into something really usable for the vast majority of software.

We believe the two fundamental problems preventing widespread adoption of autonomous testing are: (1) most software is non-deterministic, but non-determinism breaks the core feedback loop that guides things like coverage-guided fuzzing. (2) the state space you're searching is inconceivably vast, and the search problem in full generality is insolubly hard. Antithesis tries to address both of these problems.

So... is it fuzzing? Sort of, except you can apply it to whole interacting networked systems, not just standalone parsers and libraries. Is it property-based testing? Sort of, except you can express properties that require a "global" view of the entire state space traversed by the system, which could never be locally asserted in code. Is it fault injection or chaos testing? Sort of, except that it can use the techniques of coverage guided fuzzing to get deep into the nooks and crannies of your software, and determinism to ensure that every bug is replayable, no matter how weird it is.

It's hard to explain, because it's hard to wrap your arms around the whole thing. But our other big goal is to make all of this easy to understand and easy to use. In some ways, that's proved to be even harder than the very hard technological problems we've faced. But we're excited and up for it, and we think the payoff could be big for our whole industry.

Your feedback about what's explained well and what's explained poorly is an important signal for us in this third very hard task. Please keep giving it to us!



I remember watching the Strange Loop video on your testing strategy, and now I need to go back and relearn how it differed from model checking (ie Promela or TLA+). Model checking is probably the big QA story that tech companies ignore because it requires dramatically more education, especially from QA departments typically seen as "inferior" to SWE.




This is interesting - it is kind of picking a fight with SaaS/cloud providers though, as that is the one kind of software you won't be able to import into your environment: not because it can't do the job, but because you don't have the code. So this would create an incentive to go back to PaaS.

It's definitely true though that a big problem with backend is that you can't easily treat it as a whole system for test purposes.



> it is kind of picking a fight with SaaS/cloud providers

or starting a bidding war



how so?


> turn them into something really usable for the vast majority of software

Would it work for debugging, say, Notepad on Windows?



Has any thought been given to repurposing this deterministic computer for more than just autonomous testing/fuzzing? For example, given an ability to record/snapshot the state, resumable software (i.e. durable execution)?


Somebody once suggested to me that this could be very hand for the reproducible builds folks. I'm sure that now that we're out in the open, lots of people will suggest great applications for it.

Disclosure: Antithesis co-founder.



My favourite application for "deterministic computer" is creating a cluster in order to have a virtual machine which is resilient to hardware failure. Potentially even "this VM will keep running even if an entire AWS region goes down" (although that would add significant latency).


> most software is non-deterministic

Doesn't Antithesis rely on the fact that software is always deterministic? Reproducibility appears to be its top selling feature – something that wouldn't be possible if software were non-deterministic.



We can force any* software to be deterministic.

* Offer only good for x86-64 software that runs on Linux whose dependencies you can install locally or mock. The first two restrictions we will probably relax someday.



That point about dependencies -- how well does this play or easy to integrate with a build system like Bazel or Buck?


Aren't you just 'forcing' determinism in the inputs, relying on the software to be always deterministic for the same inputs?


Nope. We’re emulating a deterministic computer, so your software can’t act nondeterministically if it tries.


Right, by emulating a deterministic computer you can ensure that the inputs to the software are always deterministic – something traditional computing environments are unable to offer for various reasons.

However, if we pretend that software was somehow able to be non-deterministic, it would be able to evade your deterministic computer. But since software is always deterministic, you just have to guarantee determinism in the inputs.



[I work at Antithesis]

>But since software is always deterministic, you just have to guarantee determinism in the inputs.

This is technically correct, but that's a very load-bearing "just". A lot of things would have to count as inputs. Think about execution time, for example. CPUs don't execute at the same speed all the time because of automatic throttling. Network packets have different flight times. Threads and processes get scheduled a little differently. In distributed/concurrent systems, all this matters. If you run the same workload twice, observable events will happen at different times and in different orders because of tiny deviations in initial conditions.

So yes, if you consider the time it takes to run every single machine instruction as an "input", then software is deterministic given the same inputs. But in the real world that's not actionable. Even if you had all those inputs, how are you going to pass them in? For all intents and purposes most software execution is non-deterministic.

The Antithesis simulation is deterministic in this way though. It is in charge of how long everything takes in "simulated time", right down to the running times of individual CPU instructions. Everything observable from within the simulation happens the exact same way, every time. You can compare a memory dump at the same (simulated) instant across two different runs and they will be bit-for-bit identical.



> Think about execution time, for example.

Sure. A good example. Execution time – more accurately, execution speed – isn't a property of software. For example, as you point out yourself, you can alter the execution speed without altering the software. It is, indeed, an input.

> Even if you had all those inputs, how are you going to pass them in?

Well, we know how to pass them in non-deterministically. That's how software is able to do anything.

Perhaps one could create a simulated environment that is able to control all the inputs? In fact, I'm told there is a company known as Antithesis working on exactly that.



Oh, that sounds like a challenge…

Is the challenge here the same as with digital simulations of electronic circuits? That is, at the end of the day analog physics becomes confounding? Or are you doing deterministic simulation of random RF noise as well?



Do you emit deterministic sequences from things like RDRAND? I guess you'd have to.


Yes, they said they do


> Your feedback about what's explained well and what's explained poorly is an important signal for us in this third very hard task. Please keep giving it to us!

It's hard to understand these complex concepts via language alone.

Diagrams would be a huge help to understand how this system of testing works compared to existing testing concepts



This vaguely reminds me of Jefferson's "Virtual Time" paper from 1985[1]. The underlying idea at the time didn't really take off because it required, like Zookeeper, a greenfield project: except that it kinda doesn't and today you could imagine instrumenting an entire Linux syscall table and letting any Linux container become a virtual time system -- but Linux didn't exist in 1985 and wouldn't be standard until much later.

So Jefferson just says, let's take your I/O-ful process, split it a message-passing actor model, and monitor all the messages going in and coming out. The messages coming out, they won't necessarily do what they're supposed to do yet, they'll just be recorded with a plus sign and a virtual timestamp, and by assumption eventually you'll block on some response. So we have a bunch of recorded message timestamps coming in, we have your recorded messages going out.

Well, there's a problem here, which is that if we have multiple actors we may discover that their timestamps have traveled out-of-order. You sent some message at t=532 but someone actually sent you a message at t=231 that you might have selected instead of whatever you actually selected to send the t=532 message. (For instance in the OS case, they might have literally sent a SIGKILL to your process and you might not have sent anything after that.) That's what the plus sign is for, indirectly: we can restart your process from either a known synchronization state or else from the very beginning, we know all of its inputs during its first run so we have "determinized" it up past t=231 to see what it does now. Now, it sends a new message at say t=373. So we use the opposite of +, the minus sign, to send to all the other processes the "undo" message for their t=532 message, this removes it from their message buffer: that will never be sent to them. And if they haven't hit that timestamp in their personal processing yet, no further action is needed, otherwise we need to roll them back too. Doing so you determinize the whole networked cluster.

The only other really modern implementation of these older ideas that I remember seeing was Haxl[2], a Haskell library which does something similar but rather than using a virtual time coordinate, it just uses a process-local cache: when you request any I/O, it first fetches from the cache if possible and then if that's not possible it goes out, fetches the data, and then caches it. As a result you can just offer someone a pre-populated cache which, with these recorded inputs, will regenerate the offending stack trace deterministically.

1: https://dl.acm.org/doi/10.1145/3916.3988

2: https://github.com/facebook/Haxl



Is there more info on how Antithesis solves problem number 2 (large state spaces)? I understand the fuzzing / workload generation part well, but there's so many different state space reduction techniques that I don't know what Antithesis is doing under the hood to combat that.


thanks, I'll dig in. I'm a very visual person and charts/diagrams/flows always help my grasp of something more than a wall of text. Maybe include some of those in there when you get the time?


Sure, it doesn't go into details. And that is exactly why I termed it an excellent introduction and a sales pitch.

I haven't heard of deterministic testing before. Nor have I heard of FoundationDB or the related things. And I went from knowing zero things about them to getting impressed and interested. This led me to go into their docs, blogs, landing page, etc. to know more.



Yeah. I could figure out the global idea, but then the mechanics of how it would actually work were very sparse.


Did you read a different article than me?

The linked article is 3/4 about some history and rationale before it actually tells you what they build.

It's like those pesky recipe blogs that tell you about the authors childhood, when you just want to make vegan pancakes.



This is a great pitch, and I don't want to come across as negative, but I feel like a statement like "we found all bugs" can only be true with a very narrow definition of bug.

The most pernicious, hard-to-find bugs that I've come across have all been around the business logic of an application, rather than it hitting into an error state. I'm thinking of the category where you have something like "a database is currently reporting a completed transaction against a customer, but no completed purchase item, how should it be displayed on the customer recent transactions page?". Implementing something where "a thing will appear and not crash" in those cases is one thing, but making sure that it actually makes sense as a choice given all the context of everyone elses choices everywhere else in the stack is a lot harder.

Or to take a database, something along the lines of "our query planner produces a really suboptimal plan in this edge-case".

Neither of those types of problems could ever be automatically detected, because they aren't issues of the programming reaching an error state- the issue is figuring out in the first place what "correct" actually is for you application.

Maybe I'm setting the bar too high for what a "bug" is, but I guess my point is, its one thing to fantasize about having zero bugs, its another to build software in the real world. I probably still settle for 0 run time errors though to be fair. . .



I do think that it was a mistake to use the word "all" and imply that there are absolutely no bugs in FoundationDB. However, FoundationDB is truly known as having advanced the state of the art for testing practices: https://apple.github.io/foundationdb/testing.html.

So in normal cases this would reek of someone being arrogant / overconfident, but here they really have gotten very close to zero bugs.



The other issue I would point out is that building a database, while impressive with their quality, is still fundamentally different than an application or set of applications like a larger SaaS offering would involve (api, web, mobile, etc). Like the difference between API and UI test strategies, where API has much more clearly defined and standardized inputs and outputs.

To be clear, I am not saying that you can't define all inputs and outputs of a "complete SaaS product offering stack", because you likely could, though if it's already been built by someone that doesn't have these things in mind, then it's a different problem space to find bugs.

As someone who has spent the last 15 years championing quality strategy for companies and training folks of varying roles on how to properly assess risk, it does indeed feel like this has a more narrow scope of "bug" as a definition, in the sort of way that a developer could try to claim that robust unit tests would catch "any" bugs, or even most of them. The types of risk to a software's quality have larger surface areas than at that level.



There's a lot of assertions that I throw into business applications that would be very useful to test in this way. So I don't think this only applies to testing databases.

Also, when properties are difficult to think of, that often means that a model of the behavior might be more appropriate to test against, e.g. https://concerningquality.com/model-based-testing/. It would take a bit of design work to get this to play nicely with the Antithesis approach, but it's definitely doable.



Just to clarify, I am definitely not saying this is only useful or only applies to databases.

The point was more that, I don't see how this testing approach (at the level that it functions) would catch all of the bugs that I have seen in my career, and so to say "all of the bugs" or even "most of the bugs" is definitely a stretch.

This is certainly useful, just like unit tests, assertions, etc are all very useful. It's just not the whole picture of "bugs".



Yes, there are plenty of non-functional logic bugs, e.g. performance issues. I think this starts to drastically hone in on the set of "all" bugs though, especially by doing things like network fault injection by default. This will trigger complex interactions between dependencies that are likely almost never tested.

They should clarify that this is focused on functional logic bugs though, I agree with that.



I’d go so far as to say it is actually impossible to do UI testing in some kind of web based product unless it came from the browser makers themselves.

I’d settle for decent heap debugging.



I think the reference to "all the bugs" here is basically that our insanely brutal deterministic testing system was not finding any more bugs after 100's of thousands of runs. Can't prove a negative obviously, but the fact that we'd gotten to that "all green" status gave us a ton of confidence to push forward in feature development, believing we were building on something solid - which, time has shown we were.


Thanks -- that's very clarifying! But isn't this circular? The lack of bugs is used as evidence of the effectiveness of the testing approach, but the testing approach is validated by...not finding any more bugs in the software?


Yeah but if your software is running in an environment that controls for a lot of non-determinism and can simulate various kinds of failures and degradations at varying rates, and do it all in accelerated time and your software is still working correctly; I think it’d be somewhat reasonable to assert that maybe the testing setup has done a pretty good job.


Agreed, the approach sounds very interesting and I can see how it could be very effective! I'd love to try it on my own stuff. That's why it's so surprising (to me) to claim that the approach found nearly every bug in something as complicated as a production distributed database. My career experience tells me (quite strongly) that can't possibly be true.


The best definition I've heard for "bug" is "software not working as documented". Of course, a lot of software is lacking documentation -- and those are doc bugs. But I like this definition because even when the docs are incomplete, the definition guides you to ask: would I really document that the software behaves like this or would I change the behavior [and document that]? It's harder (at least for me) to sweep goofy behavior under the rug.


I consider a "bug" to be "it was supposed to do something and failed".

Issues around business logic are not failures of the system, the system worked to spec, the spec was not comprehensive enough and now we iterate.



What do you call it when the spec is wrong? Like clearly actually wrong, such as when someone copied a paragraph from one CRUD-describing page to the next and forgot to change the word "thing1" to "thing2" in the delete description.

Because I'd call that a bug. A spec bug, but a bug. It's no feature request to make the code based on the newer page delete thing2 rather than thing1, it's fixing a defect



Ya, I would like a word for this as well. I naturally refer to this category of error as bug, but this occasionally leads to significant conflict with others at work. I now default to calling _almost everything_ a feature request, which is obviously dumb but less likely to get me into trouble. If there is a better word for "it does exactly what we planned, but what we planned was wrong" I would love to adopt it.


I reported such a bug to some software my company uses (Tempo). Vendor proceeds to call it a feature request because the software successfully fails to show public information (visible in the UI, but HTTP 403 in the API unless you're an admin).

Instead of changing one word in the code that defines the access level required for this GET call, it gets triaged as not being a bug, put on a backlog, and we never heard from it again obviously

We pay for this shit



Successful failure is my favorite kind, I like to think that all my failures are successful


There’s the distinction between correctness and fitness for purpose which I think is helpful for clarifying the issues here.

Correctness bug: it didn’t do what the spec says it should do.

Fitness for purpose bug: it does what the spec says to do, but, with better knowledge, the spec isn’t what you actually want.

Edit: looks like this maps, respectively, to failing verification and failing validation. https://news.ycombinator.com/item?id=39359673

Edit2: My earlier comment on the different things that get called "bugs", before I was aware of this terminology: https://news.ycombinator.com/item?id=22259973



Systems Engineering has terminology for this distinction.

Verification is "does this thing do what I asked it to do".

Validation is "did I ask it to do the right thing".



Tangentially related, but I've recently started distinguishing verification and validation in my data cleaning work:

verification refers to "is this dataset clean?" or the more precise "does this dataset confirm my assumptions about what a what a correct dataset should be given its focus"

validation refers to "can it answer my questions?" or the more rigorous "can I test my hypotheses against this dataset?"

So I find this interesting (but in hindsight unsurprising) that similar definitions are used in other fields. Would you have a source for your defintions?



They're fairly standard terms from "old style" project management - they show up in the usual V Model of Waterfall vein.

E.g. see Wikipedia: https://en.m.wikipedia.org/wiki/Verification_and_validation



A spec bug is just as bad as a code bug! Declaring a system free of defects because it matches the spec is sneaky sleight-of-hand that ignores the costs of having a spec.

The actual testing value is the difference between the cost of writing and maintaining the code, and the cost of writing and maintaining the spec.

If the spec is similar in complexity to the code itself, then bugs in the spec are just as likely as bugs in the code, thus verification to spec has gained you nothing (and probably cost you a lot).



I agree they are separate, but in my long experience, spec bugs are at least as common as your first definition.


...And now we could probably start debating your narrow definition of "system". ;-)


Most of the software I've built doesn't have "a spec.", but let me zoom in on specs. around streaming media. MPEG DASH, CMAF or even the base media file format (ISO/IEC 14496-12) at times can be pretty vague. In practice, this frequently turns up in actual interoperability issues where it's pretty difficult to point out which of two products is according to spec and which one has a bug.

So yes, I totally agree with GP and would actually go further: a phrase like "we found all the bugs in the database" is nonsense and makes the article less credible.



To be fair, the line right after that is "I know, I know, that's an insane thing to say."


I feel like business logic bugs live on a separate layer, the application layer, and it's not fair to count those against the database itself.

I agree that suboptimal query planning would be a database-layer bug, a defect which could easily be missed by the bug-testing framework.



Good summary of the hard part of being a software developer that deals with clients.


What software developer does not deal with clients (and makes a living)?


lots of software developers never deal with clients (clients as in the people who will actually use the software) - most of them in fact, in any of the big companies I have worked for anyway...and that is probably not a good thing.

I myself, prefer to work with the people who will actually use what I build - get a better product hat way.



I've been super interested in this field since finding out about it from the `sled` simulation guide [0] (which outlines how FoundationDB does what they do).

Currently bringing a similar kind of testing in to our workplace by writing our services to run on top of `madsim` [1]. This lets us continue writing async/await-style services in tokio but then (in tests) replace them with a deterministic executor that patches all sources of non-determinism (including dependencies that call out to the OS). It's pretty seamless.

The author of this article isn't joking when they say that the startup cost of this effort is monumental. Dealing with every possible source of non-determinism, re-writing services to be testable/sans-IO [2], etc. takes a lot of engineering effort.

Once the system is in place though, it's hard to describe just how confident you feel in your code. Combined with tools like quickcheck [3], you can test hundreds of thousands of subtle failure cases in I/O, event ordering, timeouts, dropped packets, filesystem failures, etc.

This kind of testing is an incredibly powerful tool to have in your toolbelt, if you have the patience and fortitude to invest in it.

As for Antithesis itself, it looks very very cool. Bringing the deterministic testing down the stack to below the OS is awesome. Should make it possible to test entire systems without wiring up a harness manually every time. Can’t wait to try it out!

[0]: https://sled.rs/simulation.html

[1]: https://github.com/madsim-rs/madsim?tab=readme-ov-file#madsi...

[2]: https://sans-io.readthedocs.io/

[3]: https://github.com/BurntSushi/quickcheck?tab=readme-ov-file#...



> Dealing with every possible source of non-determinism, re-writing services to be testable/sans-IO [2], etc. takes a lot of engineering effort.

Are there public examples of what such a re-write looks like?

Also, are you working at a rust shop that's developing this way?

Final Note, TigerBeetle is another product that was written this way.



TigerBeetle is actually another customer of ours. You might ask why, given that they have their own, very sophisticated simulation testing. The answer is that they're so fanatical about correctness, they wanted a "red team" for their own fault simulator, in case a bug in their tests might hide a bug in their database!

I gotta say, that is some next-level commitment to writing a good database.

Disclosure: Antithesis co-founder here.



Sure! I mentioned a few orthogonal concepts that go well together, and each of the following examples has a different combination that they employ:

- the company that developed Madsim (RisingWave) [0] [1] is tries hardest to eliminate non-determinism with the broadest scope (stubbing out syscalls, etc.)

- sled [2] itself has an interesting combo of deterministic tests combined with quickcheck+failpoints test case auto-discovery

- Dropbox [3] uses a similar approach but they talk about it a bit more abstractly.

Sans-IO is more documented in Python [4], but str0m [5] and quinn-proto [6] are the best examples in Rust I’m aware of. Note that sans-IO is orthogonal to deterministic test frameworks, but it composes well with them.

With the disclaimer that anything I comment on this site is my opinion alone, and does not reflect the company I work at —— I do work at a rust shop that has utilized these techniques on some projects.

TigerBeetle is an amazing example and I’ve looked at it before! They are really the best example of this approach outside of FoundationDB I think.

[0]: https://risingwave.com/blog/deterministic-simulation-a-new-e...

[1]: https://risingwave.com/blog/applying-deterministic-simulatio...

[2]: https://dropbox.tech/infrastructure/-testing-our-new-sync-en...

[3]: https://github.com/spacejam/sled

[4]: https://fractalideas.com/blog/sans-io-when-rubber-meets-road...

[5]: https://github.com/algesten/str0m

[6]: https://docs.rs/quinn-proto/0.10.6/quinn_proto/struct.Connec...



> you can test hundreds of thousands of subtle failure cases in I/O, event ordering, timeouts, dropped packets, filesystem failures, etc.

As cool as all this is, I can't stop but wonder how often the culture of micro-services and distributed computing is ill advised. So much complexity I've seen in such systems boils down to calling a "function" is: async, depends on the OS, is executed at some point or never, always returns a bunch of strings that need to be parsed to re-enter the static type system, which comes with its own set of failure modes. This makes the seemingly simple task of abstracting logic into a named component, aka a function, extremely complex. You don't need to test for any of the subtle failures you mentioned if you leave the logic inside the same process and just call a function. I know monoliths aren't always a good idea or fit, at the same time I'm highly septical whether the current prevalence of service based software architectures is justified and pays off.



> I can't stop but wonder how often the culture of micro-services and distributed computing is ill advised.

You can't get away from distributed computing, unless you get away from computing. A modern computer isn't a single unit, it's a system of computers talking to each other. Even if you go back a long time, you'll find many computers or proto-computers talking to each other, but with a lot stricter timings, as the computers are less flexible.

If you save a file to a disk, you're really asking the OS (somehow) to send a message to the computer on the storage device, asking it to store your data, and it will respond with success or failure and it might also write the data. (sometimes it will tell your os success and then proceed to throw the data away, which is always fun)

That said, keeping things together where it makes sense, is definitely a good thing.



I see your point. Even multithreading can be seen as a form of distributed programming. At the same time, in my experience these parts can often be isolated. You trust your DB to handle such issues, and I'm very happy we are getting a new era of DBs like Tigerbetle, FoundationDB and sled that are designed to survive Jepsen. But how many teams are building DBs? That point is a bit ironic, given I'm currently building an in-memory DB at work. But it's a completely different level of complexity. And your example with writing a file, that too is a somewhat solved problem, use ZFS. I'd argue there are many situations where the fault tolerant distributed requirements can be served by existing abstractions.


The writing is really enjoyable.

> Programming in this state is like living life surrounded by a force field that protects you from all harm. [...] We deleted all of our dependencies (including Zookeeper) because they had bugs, and wrote our own Paxos implementation in very little time and it _had no bugs_.

Being able to make that statement and back it by evidence must be indeed a cool thing.



The earliest that I've seen the attitude that one should eliminate dependencies because they have more bugs than internally written code was this book from 1995: https://store.doverpublications.com/products/9780486152936

pp. 65-66:

> The longer I have computed, the less I seem to use Numerical Software Packages. In an ideal world this would be crazy; maybe it is even a little bit crazy today. But I've been bitten too often by bugs in those Packages. For me, it is simply too frustrating to be sidetracked while solving my own problem by the need to debug somebody else's software. So, except for linear algebra packages, I usually roll my own. It's inefficient, I suppose, but my nerves are calmer.

> The most troubling aspect of using Numerical Software Packages, however, is not their occasional goofs, but rather the way the packages inevitably hide deficiencies in a problem's formulation. We can dump a set of equations into a solver and it will usually give back a solution without complaint - even if the equations are quite poorly conditioned or have an unsuspected singularity that is distorting the answers from physical reality. Or it may give us an alternative solution that we failed to anticipate. The package helps us ignore these possibilities - or even to detect their occurrence if the execution is buried inside a larger program. Given our capacity for error-blindness, software that actually hides our errors from us is a questionable form of progress.

> And if we do detect suspicious behavior, we really can't dig into the package to find our troubles. We will simply have to reprogram the problem ourselves. We would have been better off doing so from the beginning - with a good chance that the immersion into the problem's reality would have dispelled the logical confusions before ever getting to the machine.

I suppose whether to do this depends on how rigorous one is, how rigorous certain dependencies are, and how much time one has. I'm not going to be writing my own database (too complicated, multiple well-tested options available) but if I only use a subset of the functionality of a smaller package that isn't tested well, rolling my own could make sense.



In the specific case in question, the biggest problem was that dependencies like Zookeeper weren't compatible with our testing approach, so we couldn't do true end to end tests unless we replaced them. One of the nice things about Antithesis is that because our approach to deterministic simulation is at the whole system level, we can do it against real dependencies if you can install them.

I was a co-founder of both FoundationDB and Antithesis.



That tracks well (both the quotes and your thoughts).

One example that comes to mind where I want to roll my own thing (and am in the process of doing so) is replacing our ci/cd usage of jenkins that is solely for running qa automation tests against PR's on github. Jenkins does way way more than we need. We just need github PR interaction/webhook, secure credentials management, and spawning ecs tasks on aws...

Every time I force myself to update our jenkins instance, I buckle up because there is probably some random plugin, or jenkins agent thing, or ... SOMETHING that will break and require me to spend time tracking down what broke and why. 100% surface area for issues, whilst we use



I have proved my code has no bugs according to the spec.

I do not make the claim my spec has no bugs.



With formal proof systems, you can also claim that for your spec.


A formal proof is only as good as what-you-are-proving maps to what-you-intended-to-prove.


This doesn't track with the real world, though.

If you are writing software, it is almost always trying to accomplish a goal outside of itself. It is trying to solve a problem for someone, and how that problem can or should be solved is rarely perfectly clear.

The spec is supposed to map to a real world problem, and there is never going to be a way to formalize that mapping.



I've written formal proofs with bugs more than once. Reality is much messier than you can encode into any proof and there will ultimately be a boundary where the real systems you're trying to build can still have bugs.

Formal verification is incredibly, amazingly good if you achieve it, but it's not the same as "perfect".



No you can't.

You can claim that your spec doesn't violate some invariants in a finite number of steps, you can't claim that the spec contains all the invariants the real system must have and that it doesn't violate them in number of steps + 1.



"Its not a bug, its a feature"


Three thoughts:

1. It's a brilliant idea that came at the right time. It feels like people are finally losing patience with flaky software, see developer sentiment on: fuzzers, static typing, memory safety, standardized protocols, containers, etc.

2. It's meant to be niche. $2 per hour per CPU (or $7000 per year per CPU if reserved), no free tier for hobby or FOSS, and the only way to try/buy is to contact them. Ouch. It's a valid business model, I'm just sad it's not going for maximum positive impact.

3. Kudos for the high quality writing and documentation, and I absolutely love that the docs include things like (emphasis in original):

> If a bug is found in production, or by your customers, you should demand an explanation from us.

That's exactly how you buy developer goodwill. Reminds me of Mullvad, who I still recommend to people even after they dropped the ball on me.



Thanks for your kind words! As I mention in this comment (https://news.ycombinator.com/item?id=39358526) we are planning to have pricing suitable for small teams, and perhaps even a free tier for FOSS, in the future.

Disclosure: Antithesis co-founder.



There a few FOSS projects I'd love to set this up for if you ever get to the free tier. :)


"It's meant to be niche. $2 per hour per CPU (or $7000 per year per CPU if reserved), no free tier for hobby or FOSS, and the only way to try/buy is to contact them. Ouch. It's a valid business model, I'm just sad it's not going for maximum positive impact."

This is the sort of thing that, if it takes off, will start affecting the entire software world. Hardware will start adding features to support it. In 30 years this may simply be how computing works. But the pioneers need to recover the costs of the arrows they got stuck with before it can really spread out. Don't look at this an event, but as the beginning of a process.



I think their target audience is teams who already have mature software and comprehensive tests. From the docs, the kinds of bugs their platform is designed to find are the wild “unreproducible” kind that only happens rarely in production. Most teams have much bigger problems and obvious bugs to fix.

Heck, most software in production today barely has unit tests.



$2 per hour per CPU could be expensive or inexpensive, depending on how long it takes to fuzz your program. I wonder how that multiplies out in real use cases?


I met Antithesis at Strangeloop this year and got to talk to employees about the state of the art of automated fault injection that I was following when I worked at Amazon, and I cannot overstate how their product is a huge leap forward compared to many of the formal verification systems being used today.

I actually got to follow their bug tracking process on an issue they identified in Apache Spark streaming - going off of the docs, they managed to identify a subtle and insidious correctness error in a common operation that would've caused headaches in low visibility edge case for years at that point. In the end the docs were incorrect, but after that showing I cannot imagine how critical tools like Antithesis will be inside companies building distributed systems.

I hope we get some blog posts that dig into the technical weeds soon, I'd love to hear what brought them to their current approach.



> a platform that takes your software and hunts for bugs in it

Ok but, what actually IS it?

It seems like it is a cloud service that will run integration tests. I have to figure out how to deploy to this special environment and I still have to write those integration tests using special libraries.

But even after all that integration refactoring, how is this supposed to help me find actual bugs that I wouldn't already have found in my own environment with my own integration tests?



I'd suggest taking a dive into the docs - there is quite a lot there that should address some of these questions.

That said, Antithesis doesn't require you to write manual tests, integration or otherwise. It requires your software system to be packaged in containers, which is fairly straightforward, and then requires a workload to be written which will emulate the normal functioning of the software system. So for example an e-commerce store would have product views, cart adds, checkouts, etc.

With this, Antithesis can start testing (running your workload, varying inputs, injecting faults, etc) the software and looking for violations of test properties. There are many (60+) test properties that come "out of the box" such as crashes, out of memory, etc. You can (and should) also define custom properties that are unique to your system, as this will surface more problems.

As your tests run, violations of test properties are reported, with lots of useful debug information included. Test runs that are particularly interesting can have a lot of extra analysis done, due to our ability to "rewind" and change inputs, get artifacts, add logging, etc.



"Workloads" seem to be effectively equivalent to integration tests.

I don't mean to poke holes but I'm having trouble seeing the value add here.

If I have to deploy to some new environment anyways and I have to tailor the "Workloads" anyways why would I pay extra for vendor lock-in?

The type of devious bug this is promising to find would be something like:

"The DB silently drops timezone from Dates because of the column type. This results in unexpected data being returned for users in different timezones from the server"

I just don't see how repeatably calling the API with an expanding set of random inputs helps find something like that.



The article says they created a deterministic hypervisor that runs all pseudorandom behavior from a starting seed to enable perfect re-playability.

But that's all we know so far. I'm assuming there'll be some sort of fuzz testing, and static analysis or some defining actions that your software can perform.

Honestly it sounds a lot like it has a lot of crossover with what the Vale language is trying to solve: https://vale.dev/, but focused on trying to get existing software to that state instead of creating a new language to make new software already be at that state by default.



I came away with the same questions.


Reading their docs it seems you got it pretty much correctly. You write integration tests (aka “workloads”) and then they run it under different scenarios.

This means they use their hypervisor to change random seeds, make http requests fail or take too long, break connections between servers, change the order of server responses, and all sorts of wild things you don’t usually control, but that happen in the real world. Then they compare the expected workload responses and figure out which conditions break your systems.

That’s why they sell yearly contracts - you’re supposed to pay them to keep your workloads running continuously all year to try all sorts of different combinations of failures.



I'm trying to avoid diving into the hype cycle about this immediately - but this sounds like the holy grail right? Use your existing application as-is (assuming it's containerized), and simply check properties on it?

The blocker in doing that has always been the foundations of our machines: non-deterministic CPUs and operating systems. Re-building an entire vertical computing stack is practically impossible, so they just _avoid_ it by building a high-fidelity deterministic simulator.

I do wonder how they are checking for equivalence between the simulator and existing OS's, as that sounds like a non-trivial task. But, even still, I'm really bought in to this idea.



You still have to use their SDKs to write lots of integration tests (they call them “workloads”).

Then they run those tests while injecting all sorts of failures like OS failures, network issues, race and timing conditions, random number generator issues, etc.

It’s likely the only practical way today of testing for those things reliably, but you still have to write all of the tests and define your app state.



Does it even need to be containerized? According to the post, it sounds like Antithesis is a solution at the hypervisor layer.


Yes it looks like containerization is required: https://antithesis.com/docs/getting_started/setup.html#conta...


Containers are doing two jobs for us: they give our customers a convenient way to send us software to run, and they give us a convenient place to simulate the network boundary between different machines in a distributed system. The whole guest operating system running the containers is also running inside the deterministic hypervisor and under test (and it's mostly just NixOS Linux, not something weird that we wrote).

I'm a co-founder of Antithesis.



Oh, cool to hear you're using NixOS. The Nix philosophy totally gels with the philosophy described in the post.

But it's also probably fair to describe NixOS as something weird that somebody else wrote :)



I got really excited about this, and I spent a little time looking through the documentation, but I can't figure out how this is different than randomizing unit tests? It seems if I have a unit test suite already, then that's 99% of the work? Am I misunderstanding? I am drawing my conclusions from reading the Getting Started series of the docs, especially the Workloads section: https://antithesis.com/docs/getting_started/workload.html


Antithesis here - curious what part of the Getting Started doc gave you that impression? If you take a look at our How Antithesis Works page, it might help answer you question as to how Antithesis is different from just bundling your unit tests.

https://antithesis.com/docs/introduction/how_antithesis_work...

In short though, unit tests can help to inform a workload, but we don't require them. We autonomously explore software system execution paths by introducing different inputs, faults, etc., which discovers behaviors that may have been unforeseen by anyone writing unit tests.



Thanks for the response. The linked introduction does help. The workload page does give me that impression (and based on upvotes of my post it does to others as well)...so perhaps disambiguating that the void test*() examples on the workloads page are not unit tests might help!

Congrats on the launch and I'll consider using it for some of my projects.



This is that, and the exact same vibe, except: it promises to keep being that simple even after you add threads, and locks, and network calls, and disk accesses and..

With this, if you write a test for a function that makes a network call and writes the result to disk, your test will fail if your code does not handle the network call failing or stalling indefinitely, or the disk running out of space, or the power going out just before you close the file, or..

So it’s; yes, but it expands the space where testing is as easy as unit testing to cover much more interesting levels of complexity



I am really intrigued by organizations that build effective test cultures. I am not interested in people who have testing teams (ala how it was done before, say 2004) or teams that simply do unit tests and integration tests. I am interested in people who realized that building the right testing culture is key to their success. Before reading this article sqlite would probably be my top reference. I don't have the article handy, but the sqlite developers spend like a year building a test framework to make incredibly bulletproof software. I wasn't aware of foundationDB before but the idea of the simulation engine - that's exactly what most distributed systems folks need.

disclaimer - I work at AWS. And we have a combination of TLA+, fuzz, and simulation testing. When I first started it was obvious my team had a huge testing gap. It pains me to say this but for a big part of AWS testing is sort of an after thought. It comes from the "we don't hire test engineers" mentality I suppose - but this likely differs wildly by team. Over the years we've tried to backfill the gap with simulators. But it is really hard to do this culturally. because it is really hard while you are trying to build new stuff, fix bugs, etc. And your entire team (and leaders) have to be bought into the foundational value of this infrastructure. And because we don't have it, we write COEs, we fix the obvious bugs, but do we take the 1 year it would take to avoid all problems in the future - no. So yeah I am super jealous of your "fully-deterministic event-based network simulation".



alternative name for the product: Laplace's Demon for Your Code


I don’t want to sound silly, but there are 24 open and 37 closed bugs on the FoundationDB Github page. Could it perhaps be that bug-free is somewhat exaggerated?

Antithesis looks very promising by the way :-)

Edit: perhaps Apple didn’t continue the rigorous testing while evolving the FoundationDB codebase.



FoundationDB is an impressive achievement, quite possibly the only distributed database out there that lives up to its strict serializability claims (see https://jepsen.io/consistency/models/strict-serializable for a good definition). The way they wrote it is indeed very interesting and a tool that does this for other systems is immediately worth looking at.


Is it that good? I've been tasked to deploy it for sometime and it always bit me in the ass for one reason or another. And I'm not the one who use it so I don't know if it's actually good. For now I much prefer redis.


It's great, but operationally there are lots of gotchas and little guidance.

We got bitten _hard_ in production when we accidentally allowed some of the nodes to get above 90% of the storage used. The whole database collapsed into a state where it could only do a few transactions a second. Then the ops team, thinking they were clever, doubled the size of the cluster in order to give it the resources it needed to get the average utilization down to 45%; this was an unforced error as that pushed the size of the cluster outside the fdb comfort zone (120 nodes) which is itself a problem. The deed was done though and pulling nodes was not possible in this state, so slowly, slooooowly... things got fixed.

We ended up spending an entire weekend slowly, slowly getting things back into a good place. We did not lose data, but basically prod was down for the duration, and we found it necessary to _manually_ evict the full nodes one at a time over the period.

Now, this was a few years ago, and fdb has performed wickedly fast, with utter, total reliability before that and since, and to this day the ops team is butthurt about fdb.

From an engineering perspective, if you aren't using java fdb is pretty not great, since the very limited number of abstraction layers that exist are all java-centric. There are many, many issues with the maximum transaction time thing, the maximum key size and value size and total transaction size issue, the lack of pushdown predicates (e.g., filtered scans can't be done in-place which means that in AWS, they cost a lot in inter-az network charge terms and also are gated by the network performance of your instances), and so on.

What ALL of these have issues have in common is that they bite you late in the game. The storage issue bites you when you're hitting the DB hard in production and have a big data set, the lack of abstractions means that even something as finding leaked junked keys turns out to be impossible unless you were diligent to manually frame all your values so you could identify things as more than just bytes, the transaction time thing is very weird to deal with as you tend to have creeping crud aspects and the lack of libraries that instrument the transactions to give you early warning is an issue, likewise for certain kinds of key-value pairs, there's a creeping size problem - hey, this value is an index of other values; if you're not very careful up front, you _will_ eventually hit either the txn size limit or the key limit. The usual workarounds for those is to do separate transactions - a staging transaction, then essentially a swap operation and then a garbage collection transaction - but that has lots of issues overtime when coupled with application failure.

There are answers to ALL of these, manual ones. For the popular languages other than java - Go, python, maybe Ruby - there _should_ be answers for them, but there aren't. These are very sharp edges. Those java layers are _also_ _not_ _bug_ _free_. So yeah, one has a reliable storage storage layer (a topic that has come up over and over again in the last few years) but it's the layer on top of that where all the bugs are, but now with constraints and factors that are harder to reason about than the usual storage layer.

One might say, hey, SQL has all of these problems too, except no. You can bump into transaction limits, but the limits are vastly higher than fdb and the transaction time sluggishness will identify it long before you run into the "your transaction is rejected, spin retrying something that will _never_ recover" sort of issue that your average developer will eventually encounter in fdb.

That said, I love fdb as a software achievement. I just wish they had finished it. For my current project, I have designed it out. I might be able to avoid all of the sharp edges above at this point, but since we are not a java shop, I also can't rely on all the engineers to even know they exist.



It depends how you define "good". I care mostly about my distributed database being correct, living up to its consistency claims, and providing strict serializability.

(see also https://aphyr.com/posts/283-jepsen-redis)

I care much less about how easy it is to use or deploy, but "good" is a subjective term, so other people might see things differently.



> quite possibly the only distributed database out there that lives up to its strict serializability claims

Jepsen has never tested FoundationDB, not sure why you claim this and link to Jepsen's site.



FDB co-founder here.

Aphyr / Jepsen never tested FDB because, as he tweeted "their testing appears to be waaaay more rigorous than mine." We actually put a screen cap of that tweet in the blog post linked here.



> not sure why you claim this and link to Jepsen's site.

They link to the website for a definition of the term they are using.



Great read. Great product. I've been an early user of Antithesis. My background is dependability and formal distributed systems.

This thing is magic (or rather, it's indistinguishable from magic ;-)).

If they told me I could test any distributed system without a single line of code change, do things like step-by-step debugging, even rollback time at will, I would not believe it. But Antithesis works as advertised.

It's a game-changer for distributed systems that truly care about dependability.



I love this idea. In the early days of computing computers were claimed to always be deterministic. Give it the same inputs and you get the same outputs. Little by little that disappeared, with interrupts, with multithreading, with human derived inputs, with multitasking, with distributed processing, until today computers and applications are often not deterministic at all, and it does indeed make them very difficult to test. Bringing back the determinism may not only be good for testing, it seems likely to improve reliability. While I see how this is great for distributed databases I wonder if it has application when inputs are inherently non-deterministic (e.g., human input, sensor derived inputs).


Congratulations to the Antithesis team!

I actually interviewed with them when they were just starting, and outside of being very technically proficient, they are also a great group of folks. They flew my wife and I out to DC on what happened to be the coldest day of the year that year (we are from California) so we didn’t end up following through but I’d like to think there is an alternative me out there in the multiverse hacking away on this stuff.

I highly recommend Will’s talks (which I believe he links in the blog post):

https://m.youtube.com/watch?v=4fFDFbi3toc

https://m.youtube.com/watch?v=fFSPwJFXVlw



> We thought about this and decided to just go all out and write a hypervisor which emulates a deterministic computer.

Huh. Yes, that would work. It's in the category of obvious in hindsight. That is a very convincing sales pitch.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact



Search:
联系我们 contact @ memedata.com