Tracking developer build times to decide if the M3 MacBook is worth upgrading

Aurornis · 2023-12-29T21:20:26

This is a great write-up and I love all the different ways they collected and analyzed data.

That said, it would have been much easier and more accurate to simply put each laptop side by side and run some timed compilations on the exact same scenarios: A full build, incremental build of a recent change set, incremental build impacting a module that must be rebuilt, and a couple more scenarios.

Or write a script that steps through the last 100 git commits, applies them incrementally, and does a timed incremental build to get a representation of incremental build times for actual code. It could be done in a day.

Collecting company-wide stats leaves the door open to significant biases. The first that comes to mind is that newer employees will have M3 laptops while the oldest employees will be on M1 laptops. While not a strict ordering, newer employees (with their new M3 laptops) are more likely to be working on smaller changes while the more tenured employees might be deeper in the code or working in more complicated areas, doing things that require longer build times.

This is just one example of how the sampling isn’t truly as random and representative as it may seem.

So cool analysis and fun to see the way they’ve used various tools to analyze the data, but due to inherent biases in the sample set (older employees have older laptops, notably) I think anyone looking to answer these questions should start with the simpler method of benchmarking recent commits on each laptop before they spend a lot of time architecting company-wide data collection

lawrjone · 2023-12-29T21:34:25

I totally agree with your suggestion, and we (I am the author of this post) did spot-check the performance for a few common tasks first.

We ended up collecting all this data partly to compare machine-to-machine, but also because we want historical data on developer build times and a continual measure of how the builds are performing so we can catch regressions. We quite frequently tweak the architecture of our codebase to make builds more performant when we see the build times go up.

Glad you enjoyed the post, though!

FunnyLookinHat · 2023-12-30T15:08:12

I think there's something to be said for the fact that the engineering organization grew through this exercise - experimenting with using telemetry data in new ways that, when presented to other devs in the org, likely helped them to all level up and think differently about solving problems.

Sometimes these wandering paths to the solution have multiple knock-on effects in individual contributor growth that are hard to measure but are (subjectively, in my experience) valuable in moving the overall ability of the org forward.

nox101 · 2023-12-30T04:48:03

I didn't see any analysis of network building as an alternative to M3s. For my project, ~40 million lines, past a certain minimum threshold, it doesn't matter how fast my machine is, it can't compete with the network build our infra-team makes.

So sure, an M3 might make my build 30% faster than my M1 build, but the network build is 15x faster. Is it possible instead of giving the developers M3s they should have invested in some kind of network build?

Kwpolska · 2023-12-30T08:32:03

Network full builds might be faster, but would incremental builds be? Would developers still be able to use their favourite IDE and OS? Would developers be able to work without waiting in a queue? Would developers be able to work offline?

If you have a massive, monolithic, single-executable-producing codebase that can't be built on a developer machine, then you need network builds. But if you aren't Google, building on laptops gives developers better experience, even if it's slower.

idontknowifican · 2023-12-30T13:54:38

i hate to say it but working offline is not really a thing at work anymore. it is no one thing, but a result of k8s by and large. i think a lot of places got compliant when you could just deploy a docker image, fuck how long that takes and how slow it is on mac

Kwpolska · 2023-12-30T19:25:29

That depends entirely on the tech stack, and how much you care about enabling offline development. You can definitely run something like minikube on your laptop.

novok · 2023-12-30T08:05:51

That is a very large company if you have a singular 40 million line codebase, maybe around 1000 engineers or greater? Network builds also takes significant investment in adopting stuff like bazel and a dedicated devex team to pull off most of the time. Setting up build metrics to determine a build decision and the other benefits that come from it is a one month project at most for one engineer.

It's like telling an indie hacker to adopt a complicated kubernetes setup for his app.

Cacti · 2023-12-30T21:39:06

1,000 is a small company.

dzek69 · 2023-12-30T06:33:36

Maybe, but I feel that s not the point here

rag-hav · 2023-12-30T07:34:08

What do you mean by network build?

kwk1 · 2023-12-30T08:39:25

They probably mean tools like distcc or sccache:

https://github.com/distcc/distcc

https://github.com/mozilla/sccache

baq · 2023-12-30T12:43:40

And incredibuild: https://www.incredibuild.com/

DeathArrow · 2023-12-30T08:51:08

Dedicated build machines.

dimask · 2023-12-30T15:45:23

> This is a great write-up and I love all the different ways they collected and analyzed data.

> [..] due to inherent biases in the sample set [..]

But that is an analysis methods issue. This serves as a reminder that one cannot depend on AI-assistants when they are not themselves enough knowledgeable on a topic. At least for the time being.

For once, as you point, they conducted a t-test on data that are not independently sampled, as multiple data points were sampled by different people, and there are very valid reasons to believe that different people would have different tasks that may be more or less compute-demanding, which confound the data. This violates one of the very fundamental assumptions of the t-test, which was not pointed out by the code interpreter. In contrast, they could have modeled their data with what is called "linear mixed effects model" where stuff like person (who the laptop belongs to) as well as possibly other stuff like seniority etc could be put into the model as "random effects".

Nevertheless it is all quite interesting data. What I found most interesting is the RAM-related part: caching data can be very powerful, and higher RAM brings more benefits than people usually realise. Any laptop (or at least macbook) with more RAM than it usually needs has most of the time its extra RAM filled by cache.

j4yav · 2023-12-30T07:22:14

I agree, it seems like they were trying to come up with the most expensive way to answer the question possible for some reason. And why was the finding in the end to upgrade M1 users to more expensive M3s when M2s were deemed sufficient?

heliodor · 2023-12-30T12:23:55

If employees are purposefully isolated from the company's expenses, they'll waste money left and right.

Also, they don't care since any incremental savings aren't shared with the employees. Misaligned incentives. In that mentally, it's best to take while you can.

sgjohnson · 2023-12-30T17:16:45

Because M2s are no longer produced.

matthew-wegner · 2023-12-30T01:10:21

I would think you would want to capture what/how was built, as like:

* Repo started at this commit

* With this diff applied

* Build was run with this command

Capture that for a week. Now you have a cross section of real workloads, but you can repeat the builds on each hardware tier (and even new hardware down the road)

dash2 · 2023-12-29T21:28:48

As a scientist, I'm interested how computer programmers work with data.

* They drew beautiful graphs!

* They used chatgpt to automate their analysis super-fast!

* ChatGPT punched out a reasonably sensible t test!

But:

* They had variation across memory and chip type, but they never thought of using a linear regression.

* They drew histograms, which are hard to compare. They could have supplemented them with simple means and error bars. (Or used cumulative distribution functions, where you can see if they overlap or one is shifted.)

dgacmu · 2023-12-30T00:42:41

I'm glad you noted programmers; as a computer science researcher, my reaction was the same as yours. I don't think I ever used a CDF for data analysis until grad school (even with having had stats as a dual bio/cs undergrad).

novok · 2023-12-30T08:11:21

It's because that's usually the data scientist's job, and most eng infra teams don't have a data scientist and don't really need one most of the time.

Most of the time they deal with data the way their tools generally present data, which correlate closely to most analytics, perf analysis and observability software suites.

Expecting the average software eng to know what a CDF is the same as expecting them to know 3d graphics basics like quaternions and writing shaders.

mpoteat · 2023-12-30T22:55:52

A standard CS program will cover statistics (incl. calculus-based stats e.g. MLEs), and graphics is a very common and popular elective (e.g. covering OpenGL). I learned all of this stuff (sans shaders) in undergrad, and I went to a shitty state college. So from my perspective an entry level programmer should at least have a passing familiarity with these topics.

Does your experience truly say that the average SWE is so ignorant? If so, why do you think that is?

oakejp12 · 2023-12-31T00:38:09

> A standard CS program will cover statistics

> graphics is a very common and popular elective

I find these statements to be extremely doubtful. Why would a CS program cover statistics? Wouldn't that be the math department? If there any required courses, it's most likely Calc 1/2, Linear Algebra, and Discrete Math.

Also, out of the hundreds of programmers I've met, I don't know any that has done graphics programming. I consider that super niche.

DeathArrow · 2023-12-30T08:55:28

>Expecting the average software eng to know what a CDF is the same as expecting them to know 3d graphics basics like quaternions and writing shaders.

I did write shaders and used quaternions back in the day. I also worked on microcontrollers, did some system programming, developed mobile and desktop apps. Now I am working on a rather large microservice based app.

perfmode · 2023-12-30T19:46:57

you’re a unicorn

azalemeth · 2023-12-30T01:19:08

> ChatGPT punched out a reasonably sensible t test!

I think the distribution is decidedly non normal here and the difference in the medians may well have also been of substantial interest -- I'd go for a Wilcox test here to first order... Or even some type of quantile regression. Honestly the famous Jonckheere–Terpstra test for ordered medians would be _perfect_ for this bit of pseudoanalysis -- have the hypothesis that M3 > M2 > M1 and you're good to go, right?!

(Disclaimers apply!)

whimsicalism · 2023-12-30T01:39:04

12,000 builds? Sure maybe the build time distribution is non-normal, but the sample statistic probably is approximately normal with that many builds.

RayVR · 2023-12-30T01:44:22

Many people misinterpret what is required for a t-test.

azalemeth · 2023-12-30T10:55:16

I meant that the median is likely arguably the more relevant statistic, that is all -- I realise that the central limit theorem exists!

Herring · 2023-12-29T22:37:48

>They drew histograms, which are hard to compare.

Note that in some places they used boxplots, which offer clearer comparisons. It would have been more effective to present all the data using boxplots.

tmoertel · 2023-12-29T23:32:36

> They drew histograms, which are hard to compare.

Like you, I'd suggest empirical CDF plots for comparisons like these. Each distribution results in a curve, and the curves can be plotted together on the same graph for easy comparison. As an example, see the final plot on this page:

https://ggplot2.tidyverse.org/reference/stat_ecdf.html

mnming · 2023-12-29T22:16:42

I think it's partly because the audiences are often not familiar with those statistics details either.

Most people hates nuances when reading data report.

fallous · 2023-12-30T10:17:27

I think you might want to add the caveat "young computer programmers." Some of us grew up in a time where we had to learn basic statistics and visualization to understand profiling at the "bare metal" level and carried that on throughout our careers.

NavinF · 2023-12-30T02:32:19

> cumulative distribution functions, where you can see if they overlap or one is shifted

Why would this be preferred over a PDF? I've rarely seen CDF plots after high school so I would have to convert the CDF into a PDF inside my head to check if the two distributions overlap or are shifted. CDFs are not a native representation for most people

unsung · 2023-12-30T04:18:11

I can give a real example. At work we were testing pulse shaping amplifiers for Geiger Muller tubes. They take a pulse in, shape it to get a pulse with a height proportional to the charge collected, and output a histogram of the frequency of pulse heights, with each bin representing how many pulses have a given amount of charge.

Ideally, of all components are the same, there is no jitter, and if you feed in a test signal from a generator with exactly the same area per pulse, you should see a histogram where every count is in a single bin.

In real life, components have tolerances, and readouts have jitter, so the counts spread out and you might see, with the same input, one device with, say, 100 counts in bin 60, while a comparably performing device might have 33 each in bins 58, 59, and 60.

This can be hard to compare visually as a PDF, but if you compare CDF's, you see S-curves with rising edges that only differ slightly in slope and position, making the test more intuitive.

dash2 · 2023-12-30T05:49:27

If one line is to the right of the other everywhere, then the distribution is bigger everywhere. (“First order stochastic dominance” if you want to sound fancy.) I agree that CDFs are hard to interpret, but that is partly due to unfamiliarity.

jxcl · 2023-12-29T22:17:38

Yeah, I was looking at the histograms too, having trouble comparing them and thinking they were a strange choice for showing differences.

LASR · 2023-12-29T21:46:04

Solid analysis.

A word of warning from personal experience:

I am part of a medium-sized software company (2k employees). A few years ago, we wanted to improve dev productivity. Instead of going with new laptops, we decided to explore offloading the dev stack over to AWS boxes.

This turned out to be a multi-year project with a whole team of devs (~4) working on it full-time.

In hindsight, the tradeoff wasn't worth it. It's still way too difficult to remap a fully-local dev experience with one that's running in the cloud.

So yeah, upgrade your laptops instead.

eysi · 2023-12-30T00:41:23

My team has been developing against a fully remote environment (K8s cluster) for some years now and it makes for a really powerful DevEx.

Code sits on our laptops but live syncs to the remote services without requiring a Docker build or K8s deploy. It really does feel like local.

In particular it lets us do away with the commit-push-pray cycle because we can run integ tests and beyond as we code as opposed to waiting for CI.

We use Garden, (https://docs.garden.io) for this. (And yes I am afilliated :)).

But whether you use Garden or not, leveraging the power of the cloud for “inner loop” dev can be pretty amazing with right tooling.

I wrote a bit more about our experience here: https://thenewstack.io/one-year-of-remote-kubernetes-develop...

jayd16 · 2023-12-30T01:41:29

Kind of interesting to think that CI is significantly slower in practice and both systems need to be maintained. Is it just the overhead of pushing through git or are there other reasons as well?

eysi · 2023-12-30T09:06:40

The way we do things is that we build everything in the cloud and store in a central container registry. So if I trigger a build during dev, the CI runner can re-use that, e.g. if it’s needed before running a test or creating a preview env.

Similarly if another dev (or a CI runner) triggers a build of one of our services, I won’t have to next time I start my dev environment. And because it’s built in the cloud there’s no “works on my machine”.

Same applies to tests actually. They run in the cloud in an independent and trusted environment and the results are cached and stored centrally.

Garden knows all the files and config that belong to a given test suite. So the very first CI run may run tests for service A, service B, and service C. I then write code that only changes service B, open a PR and only the relevant tests get run in CI.

And because it’s all in prod-like environments, I can run integ and e2e tests from my laptop as I code, instead of only having that set up for CI.

mewpmewp2 · 2023-12-30T01:56:34

You would need a very perfect and flexible CI system in place that wouldn't need to rebuild anything it doesn't need and only run the tests you want or only recently failed tests etc.

Many CI systems would spin up a new box instead of using persistent so likely have to rebuild if no cache, etc.

So basically I would say most of the overhead is in not having a persistent box with knowledge of last build or ability to choose what to run in there, which pretty much just equals to local capabilities.

jayd16 · 2023-12-31T04:19:12

Having persistent boxes with sticky sessions seems seems pretty achievable.

peanball · 2023-12-30T07:46:42

Often you also have the CI system designed in a way to verify a “from scratch” build that avoids any issues with “works on my machine” scenarios due to things still being cached that shouldn’t be there anymore.

8n4vidtmkvmk · 2023-12-30T01:55:24

I tried Garden briefly but didn't like it for some reason. DevSpace was simpler to set up and works quite reliably. The sync feature where they automatically inject something into the pod works really well.

eysi · 2023-12-30T09:14:53

DevSpace is a great tool but it’s bummer you didn’t like Garden.

Admittedly, documentation and stability weren’t quite what we’d like and we’ve done a massive overhaul of the foundational pieces in the past 12 months.

If you want to share feedback I’m all ears, my email is in my profile.

rsanek · 2023-12-29T22:56:19

This might have to do with scale. At my employer (~7k employees) we started down this path a few years ago as well, and while it has taken longer for remote to be better than local, it now definitively is and has unlocked all kinds of other stuff that wasn't possible with the local-only version. One example is working across multiple branches by switching machines instead of files on local has meant way lower latency when switching between tasks.

kaishiro · 2023-12-29T23:44:25

One thing I've never understood (and admittedly have not thoroughly researched) is how a remote workspace jives with front-end development. My local tooling is all terminal-based, but after ssh'ing into the remote box to conduct some "local" development, how do I see those changes in a browser? Is the local just exposed on an ip:port?

avbor · 2023-12-29T23:54:33

You can expose the browser port via ssh, with a command line flag like `-L 8080:127.0.0.1:8080`. So you can still preview locally

kaishiro · 2023-12-30T00:00:18

Ah yeah, tunneling it back makes perfect sense - not sure why I never considered that. I'll explore that a bit - thanks for the heads up!

MonaroVXR · 2023-12-30T04:09:48

If you're using vs code, vscode is doing that automatically

Ambroos · 2023-12-30T03:17:58

Facebook's web repo which includes all the PHP and JS for facebook.com and a bunch of other sites is one big massive repo. For development you claim out a server that has a recent checkout of the codebase. Right after claiming it it syncs in your personal commits/stacks you're working on, ready to rebase. You access that machine on a subdomain of any of the FB websites. As far as I remember it was something along the lines of 12345.od.facebook.com, but the format changed from time to time as infra changed. Client certificate authentication and VPN needed (that may no longer be the case, my info is 1y+ old).

There was an internal search provider (bunnylol) that had tools like putting @od in front of any FB URL to generate a redirect of that URL to your currently checked out On Demand server. Painless to work with! Nice side benefit of living on the same domain as the main sites is that the cookies are reused, so no need to log in again.

neurostimulant · 2023-12-30T05:05:56

Have you tried vs code remote development plugin? It can do port forwarding (e.g. forwarding port 8080 on your local machine to port 8080 on the remote machine).

fragmede · 2023-12-30T00:05:01

Yes, modulo networking VPN magic so it's not available over the wider Internet for hackers to discover.

notzane · 2023-12-30T03:46:26

My company is fully using cloud desktops for engineering except iOS and Android development (we get faster laptops instead).

kaishiro · 2023-12-30T03:48:40

Are you using a product or have you just rolled your own solution?

vladvasiliu · 2023-12-30T06:34:04

Are you using a public cloud to host the dev boxen? Is compilation actually faster than locally – assuming that your pc's having been replaced to lower-specced versions since they don't do any heavy lifting anymore?

I work for a not-really-tech company (and I'm not a full-time dev either), so I've been issued a crappy "ultra-portable" laptop with an ultra-low-voltage CPU. I've looked into offloading my dev work to an AWS instance, but was quite surprised that it wasn't any faster than doing things locally for things like Rust compiles.

spockz · 2023-12-30T08:04:15

In our case it is mostly faster when provisioning a machine with significantly more cores. In cloud machines you get “vcores” which are not the same as a core on a local cpu.

I’ve been integrating psrecord into our builds to track core utilisation during the built and see that a lot of time is spent in single threaded activities. Effort is required to compile modules in parallel but that is actually quite straightforward. Running all tests in parallel is harder.

We get the most out of the cloud machines by being able to provision a 16+ core machine to run more complicated (resilience) tests and benchmarks.

Also note that typically the cloud machines run on lower clocked CPUs than you would find in a workstation depending on which machine you provision.

ParetoOptimal · 2023-12-30T17:05:56

Can't you locally switch between branches with git worktrees if you make your build cache key on worktree name?

vghaisas · 2023-12-29T23:09:02

Haha, as I read more words of your comment, I got more sure that we worked at the same place. Totally agree, remote devboxes are really great these days!

However, I also feel like our setup was well suited to remote-first dev anyway (eg. syncing of auto-generated files being a pain for local dev).

tracker1 · 2023-12-30T06:42:40

Second on this. Not being able to run a solution entirely local introduces massive friction in terms of being able to reason with said solution.

When you need to have 200+ parts running to do anything, it can be hard to work in a single piece that touches a couple others.

With servers that have upwards of 128+ cores and 256+ threads, my opinion is swinging back in favor of monoliths for most software.

dmos62 · 2023-12-29T23:21:00

My dev box died (that I used for remote work), and instead of buying something new immediately, I moved my setup to a Hetzner cloud vps. Took around 2 days. Stuff like setting up termux on my tablet and the cli environment on the vps was 90 percent of that. The plus side was that I then spent the remaining summer working outside in the terrace and in the park. Was awesome. I was able to do it because practically all of my tools are command line based (vim, etc).

sweetjuly · 2023-12-30T00:14:34

How much does this cost you? I've been dealing with a huge workstation-server thing for years in order to get this flexibility and while the performance/cost is amazing, reliability and maintenance has been a pain. I've been thinking about buying some cloud compute but an equivalent workstation ends up being crazy expensive (>$100/mo).

dinvlad · 2023-12-30T01:44:58

There’s a crazy good deal for a dedicated server with 14-core/20-thread i5-13500 CPU and 64GB RAM, for just around 40 EUR/mo: https://www.hetzner.com/dedicated-rootserver/matrix-ex

This is honestly a bit overkill for a dev workstation (unless you compile Rust!), but since it’s a dedicated server it can also host any number of fully isolated services for homelab or saas. There’s nothing else like it in the wild, afaik.

randomgiy3142 · 2023-12-30T06:40:58

I’d be careful with Hetzner. I was doing nothing malicious and signed up. I had to submit a passport which was valid US. It got my account cancelled. I asked why and they said they couldn’t say for security reasons. They seem like an awesome service, I don’t want knock them I just simply asked if I could resubmit or something the mediate and they said no. I don’t blame them just be careful. I’m guessing my passport and face might have trigged some validation issues? I dunno.

ametrau · 2023-12-30T10:36:59

You have to give a hosting company a copy of your passport?!? (And hope they delete it… eventually?)

albert180 · 2023-12-30T19:27:58

Only if you triggered some risk checking systems. I didn't need to provide anything when I signed up this year.

makeitdouble · 2023-12-30T00:27:26

It of course steongly depends on what your stack is, my current job provides a full remote dev server for our backend and it's the best experience I've seen in a long time. In particular having a common DB is suprinsingly uneventful (nobody's dropping tables here and there) while helping a lot.

We have interns coming in and fully ready within an hour or two of setup. Same way changing local machines is a breeze with very little downtime.

__jem · 2023-12-30T00:36:17

Isn't the point of a dev environment precisely that the intern can drop tables? Idk, I've never had a shared database not turn to mush over a long enough period, and think investing the effort to build data scripts to rebuild dev dbs from scratch has always been the right call.

makeitdouble · 2023-12-30T00:52:00

Dropping tables to see what happens or resetting DBs every hour is fine with a small dataset, but it becomes impractical when you work on a monolith that talks to a set of DB with a hundred+ tables in total and takes 5 hours to restore.

As you point out rebuilding small test datasets instead of just filtering the prod DB is an option, but those also need maintenance, and take a hell of time to make sure all the relevant cases are covered.

Basically, trying to flee from the bulk and complexity tends to bring a different set of hurdles and missing parts that have to be paid in time, maintenance and bugs only discovered in prod.

PS: the test DB is still reset everyday. Eorse thing happening is we need to do something else for a few hours until it's restored.

arthens · 2023-12-30T09:50:23

> We have interns coming in and fully ready within an hour or two of setup. Same way changing local machines is a breeze with very little downtime.

This sounds like the result of a company investing in tooling, rather than something specific to a remote dev env. Our local dev env takes 3 commands and less than 3 hours to go from a new laptop to a fully working dev env.

rc_kas · 2023-12-30T04:22:04

my company did this. fuck i hate it so much. if anyone wants to hire me away from this remote desktop hellscape, please do.

vladvasiliu · 2023-12-30T06:25:57

If I understand correctly, they're not talking about remote desktops. Rather, the editor is local and responds normally, while the heavy lifting of compilation is done remotely. I've dabbled in this myself, and it's nice enough.

vb6sp6 · 2023-12-30T04:56:00

I've been working this way for years, really nice. What is main complaint?

Aeolun · 2023-12-30T06:35:21

Slowness, latency, lack of control. The usual suspects?

There’s moments where you try to do a thing that normal on a local PC and it’s impossible on remote. That cognitive dissonance is the worst.

rc_kas · 2023-12-31T03:26:39

Yep. And shortcut keys and other smaller behaviors like that get weird sometimes.

tuananh · 2023-12-30T00:33:09

Thanks for the insight. It maybe depends on each team too.

While my team (platform & infra) much prefer remote devbox, the development teams are not.

It could be specific to my org because we have way too many restrictions on the local dev machine (eg: no linux on laptop but it's ok on server and my team much prefer linux over crippled Windows laptop).

WaxProlix · 2023-12-29T22:09:03

I suspect things like GitHub's Codespaces offering will be more and more popular as time goes on for this kind of thing. Did you guys try out some of the AWS Cloud9 or other 'canned' dev env offerings?

hmottestad · 2023-12-29T22:51:24

My experience with GitHub Codespaces is mostly limited to when I forgot my laptop and had to work from my iPad. It was a horrible experience, mostly because Codespaces didn’t support touch or Safari very well and I also couldn’t use IntelliJ which I’m more familiar with.

Can’t really say anything for performance, but I don’t think it’ll beat my laptop unless maven can magically take decent advantage of 32 cores (which I unfortunately know it can’t).

Kwpolska · 2023-12-30T08:40:15

AWS Cloud9 is a web IDE that can run on any EC2 box. The web IDE is a custom Amazon thing and is quite mediocre.

jeffbee · 2023-12-30T02:51:12

My company piles so much ill-considered Linux antivirus and other crap in cloud developer boxes that even on a huge instance type, the builds are ten or more times slower than a laptop, and hundreds of times slower than a real dev box with a Threadripper or similar. It's just a pure waste of money and everyone's time.

It turns out that hooking every system call with vendor crapware is bad for a unix-style toolchain that execs a million subprocesses.

xvector · 2023-12-30T00:21:49

This is just a manpower thing.

At large tech companies like Google, Meta, etc the dev environment is entirely in the cloud for the vast majority of SWEs.

This is a much nicer dev experience than anything local.

jiggawatts · 2023-12-29T21:55:55

https://xkcd.com/1205/

behnamoh · 2023-12-30T01:54:35

I disagree though. If a task is boring and repetitive, I just won't ever do it. So the comparison for people like me is:

    "spend X time to automate this task vs not do the task at all".

Whereas the xkcd is like (n = frequency that you do the task):

    "Spend X time to automate this task that takes Y×n time normally and get it down to Z×n time, vs spend Y×n time to do the task"

mdbauman · 2023-12-29T22:05:43

This xkcd seems relevant also: https://xkcd.com/303/

One thing that jumps out at me is the assumption that compile time implies wasted time. The linked Martin Fowler article provides justification for this, saying that longer feedback loops provide an opportunity to get distracted or leave a flow state while ex. checking email or getting coffee. The thing is, you don't have to go work on a completely unrelated task. The code is still in front of you and you can still be thinking about it, realizing there's yet another corner case you need to write a test for. Maybe you're not getting instant gratification, but surely a 2-minute compile time doesn't imply 2 whole minutes of wasted time.

newaccount74 · 2023-12-30T05:06:08

If you can figure out something useful to do during a two minute window, I envy you.

I really struggle with task switching, and two minutes is the danger zone. Just enough time to get distracted, by something else; too little time to start meaningful work on anything else...

Hour long compiles are okay, I plan them, and have something else to do while they are building.

30 second compiles are annoying, but don't affect my productivity much (except when doing minor tweaks to UI or copywriting).

2-10 minute compiles are the worst.

norir · 2023-12-30T00:19:07

I get what you are saying but I still think fast compilation is essential to a pleasant dev experience. Regardless of how fast the compiler is, there will always be time when we are just sitting there thinking, not typing. But when I am implementing, I want to verify that my changes work as quickly as possible and there is really no upside to waiting around for two minutes.

perrygeo · 2023-12-30T00:16:14

Yes! Pauses allow you to reflect on your expectations of what you're actually compiling. As you sit in anticipation, you reflect on how your recent changes will manifest and how you might QA test it. You design new edge cases to add to the test suite. You sketch alternatives in your notebook. You realize oh compilation will surely fail on x because I forgot to add y to module z. You realize your logs, metrics, tests and error handling might need to be tweaked to unearth answers to the questions that you just now formulated. This reflection time is perhaps the most productive time a programmer will spend in their day. Calling it "wasted" reflects a poor understanding of the software development process.

chiefalchemist · 2023-12-29T22:51:24

Spot on. The mind often needs time and space to breathe, especially after it's been focused and bearing down on something. We're humans, not machines. Creativity (i.e., problem solving) needs to be nurtured. It can't be force fed.

More time working doesn't translate to being more effective and more productive. If that were the case then why are a disproportionate percentage of my "Oh shit! I know what to do to solve that..." in the shower, on my morning run, etc.?

majikandy · 2023-12-29T23:07:45

I love those moments. Your brain has worked on it in the background like a ‘bombe’ machine cracking the day’s enigma code. And suddenly “ding… the days code is in!”

chiefalchemist · 2023-12-30T16:04:52

You might like the book "Your Brain on Work" by Dr David Rock. In fact, I'm due for a re- read.

https://davidrock.net/about/

seadan83 · 2023-12-30T17:59:32

I agree to some extent. Though, I don't think it has to be a trade-off though. After a sub-5 second compile time, I go over to get a coffee to ponder the results of the compile rather than imagine what those results might be. Taking time to think is not mutually exclusive to a highly responsive dev process.

tomaskafka · 2023-12-29T21:10:33

My personal research for iOS development, taking the cost into consideration, concluded:

- M2 Pro is nice, but the improvement over 10 core (8 perf cores) M1 Pro is not that large (136 vs 120 s in Xcode benchmark: https://github.com/devMEremenko/XcodeBenchmark)

- M3 Pro is nerfed (only 6 perf cores) to better distinguish and sell M3 Max, basically on par with M2 Pro

So, in the end, I got a slightly used 10 core M1 Pro and am very happy, having spent less than half of what the base M3 Pro would cost, and got 85% of its power (and also, considering that you generally need to have at least 33 to 50 % faster CPU to even notice the difference :)).

mgrandl · 2023-12-29T21:18:56

The M3 Pro being nerfed has been parroted on the Internet since the announcement. Practically it’s a great choice. It’s much more efficient than the M2 Pro at slightly better performance. That’s what I am looking for in a laptop. I don’t really have a usecase for the memory bandwidth…

orangecat · 2023-12-30T01:31:01

The M3 Pro and Max get virtually identical results in battery tests, e.g. https://www.tomsguide.com/news/macbook-pro-m3-and-m3-max-bat.... The Pro may be a perfectly fine machine, but Apple didn't remove cores to increase battery life; they did it to lower costs and upsell the Max.

Ar-Curunir · 2023-12-30T02:05:05

It might be the case that the yield on the chips is low, so they decided to use “defective” chips in the M3 Pro, and the non-defective in the M3 Max.

dagmx · 2023-12-30T02:28:55

In all M generations, the max and pro are effectively different layouts so can’t be put down to binning. Each generation did offer binned versions of the Pro and Max with fewer cores though.

brandall10 · 2023-12-30T10:40:55

These aren't binned chips... the use of efficiency cores does reduce transistor count considerably which could improve yields.

That said, while the transistor count of the M2 Pro -> M3 Pro did decrease, it went up quite a bit from the M2 -> M3.

It seems most likely Apple is just looking to differentiate the tiers.

Joeri · 2023-12-30T10:21:08

AI is the main use case for memory bandwidth that I know of. Local LLM’s are memory bandwidth limited when running inference, so once you fall into that trap you end up wanting the 400 gb/s max memory bandwidth of the m1/m2/m3 max, paired with lots and lots of RAM. Apple pairs memory size and bandwidth upgrades to core counts a lot more in m3 which makes the m3 line-up far more expensive than the m2 line-up to reach comparable LLM performance. Them touting AI as a use case for the m3 line-up in the keynote was decidedly odd, as this generation is a step back when it comes to price vs performance.

tomaskafka · 2023-12-29T21:28:48

Everyone has a different needs - for me, even M1 Pro has more battery life than I use or need, so further efficiency differences bring little value.

dgdosen · 2023-12-29T22:07:03

I picked up an M3Pro/11/14/36GB/1TB to 'test' over the long holiday return period to see if I need an M3 Max. For my workflow (similar to blog post) - I don't! I'm very happy with this machine.

Die shots show the CPU cores take up so little space compared to GPUs on both the Pro and Max... I wonder why.

wlesieutre · 2023-12-29T22:24:09

I don’t really have a usecase for even more battery life, so I’d rather have it run faster

Aurornis · 2023-12-29T21:23:17

My experience was similar: In real world compile times, the M1 Pro still hangs quite closely to the current laptop M2 and M3 models. Nothing as significant as the differences in this article.

I could depend on the language or project, but in head-to-head benchmarks of identical compile commands I didn’t see any differences this big.

lawrjone · 2023-12-29T21:21:09

That's interesting you saw less of an improvement in the M2 than we saw in this article.

I guess not that surprising given the different compilation toolchains though, especially as even with the Go toolchain you can see how specific specs lend themselves to different parts of the build process (such as the additional memory helping linker performance).

You're not the only one to comment that the M3 is weirdly capped for performance. Hopefully not something they'll continue into the M4+ models.

tomaskafka · 2023-12-29T21:24:48

That's what Xcode benchmarks seem to say.

Yep, there appears to be no reason for getting M3 Pro instead of M2 Pro, but my guess is that after this (unfortunate) adjustment, they got the separation they wanted (a clear hierarchy of Max > Pro > base chip for both CPU and GPU power), and can then improve all three chips by a similar amount in the future generations.

Reason077 · 2023-12-29T21:56:41

> ”Yep, there appears to be no reason for getting M3 Pro instead of M2 Pro”

There is if you care about efficiency / battery life.

kjkjadksj · 2023-12-30T18:16:25

Don’t you get better single core performance in m3 pro? Iirc it has stronger performance and efficiency cores as well.

ramijames · 2023-12-29T22:29:14

I also made this calculation recently and ended up getting an M1 Pro with maxed out memory and disk. It was a solid deal and it is an amazing computer.

jim180 · 2023-12-29T22:08:43

I love my M1 MacBook Air for iOS development. One thing, I'd like to have from Pro line is the screen, and just the PPI part. While 120Hz is a nice thing to have, it won't happen on Air laptops.

geniium · 2023-12-29T21:14:24

Basically the Pareto effect in choosing the right cpu vs cost

euos · 2023-12-30T01:57:18

I am ex-core contributor Chromium and Node.js and current core contributor to gRPC Core/C++.

I am never bothered with build times. There is "interactive build" (incremental builds I use to rerun related unit tests as I work on code) and non-interactive build (one I launch and go get coffee/read email). I have never seen hardware refresh toggle non-interactive into interactive.

My personal hardware (that I use now and then to do some quick fix/code review) is 5+ year old Intel i7 with 16Gb of memory (had to add 16Gb when realized linking Node.js in WSL requires more memory).

My work laptop is Intel MacBook Pro with a touch bar. I do not think it has any impact on my productivity. What matters is the screen size and quality (e.g. resolution, contrast and sharpness) and storage speed. Build system (e.g. speed of incremental builds and support for distributed builds) has more impact than any CPU advances. I use Bazel for my personal projects.

gizmo · 2023-12-30T02:05:57

Somehow programmers have come to accept that a minuscule change in a single function that only result in a few bytes changing in a binary takes forever to compile and link. Compilation and linking should be basically instantaneous. So fast that you don't even realize there is a compilation step at all.

Sure, release builds with whole program optimization and other fancy compiler techniques can take longer. That's fine. But the regular compile/debug/test loop can still be instant. For legacy reasons compilation in systems languages is unbelievably slow but it doesn't have to be this way.

PhilipRoman · 2023-12-30T09:45:15

This is the reason why I often use tcc compiler for my edit/compile/hotreload cycle, it is about 8x faster than gcc with -O0 and 20x faster than gcc with -O2.

With tcc the initial compilation of hostapd it takes about 0.7 seconds and incremental builds are roughly 50 milliseconds.

The only problem is that tcc's diagnostics aren't the best and sometimes there are mild compatibility issues (usually it is enough to tweak CFLAGS or add some macro definition)

cole-k · 2023-12-30T05:53:20

I mean yeah I've come to accept it because I don't know any different. If you can share some examples of large-scale projects that you can compile to test locally near-instantly - or how we might change existing projects/languages to allow for this - then you will have my attention instead of skepticism.

euos · 2023-12-30T16:02:13

That’s why I write test first. I don’t want to build everything.

euos · 2023-12-30T16:01:21

I am firmly in test-driven development camp. My test cases build and run interactively. I rarely need to do a full build. CI will make sure I didn’t break anything unexpected.

syntaxing · 2023-12-30T02:07:26

Aren’t M series screen and storage speed significantly superior to your Intel MBP? I transitioned from an Intel MBP to M1 for work and the screen was significantly superior (not sure about storage speed, our builds are all on a remote dev machine that is stacked).

euos · 2023-12-30T16:05:19

I only use laptop screen in emergencies. Storage is fast enough.

syntaxing · 2023-12-30T17:49:49

For my curiosity, what do you use for your main monitor? I’ve been wanting to replace my ultrawide with something better.

euos · 2023-12-30T17:58:30

I use 4k 32' as my monitor. My home monitor is Dell U3219Q, I am very happy with picture quality, though kids say it is bad for gaming.

karolist · 2023-12-30T08:21:19

I too come from Blaze and tried to use Bazel for my personal project which involves backend + frontend dockerized, the build rules got weird and niche real quick and I was spending lots of time working with the BUILD files making me question the value against plain old Makefiles, this was 3 years ago, maybe the public ecosystem is better now.

euos · 2023-12-30T16:07:55

I use Basel for C++. I would write normal dockerfile if I need it. Bazel docker support is an atrocity. For JS builds I also use regular TSC.

dilyevsky · 2023-12-30T19:17:16

rules_docker is now deprecated but rules_oci[0] is the replacement and so far I find it much nicer

[0] - https://github.com/bazel-contrib/rules_oci

Kwpolska · 2023-12-30T08:51:59

Chromium is a massive project. In more normally-sized projects, you can build everything on your laptop in reasonable time.

euos · 2023-12-30T18:01:38

When I worked at Chromium there were two major mitigations:

1. Debug compilation was split in shared libraries so only a couple of them has to be rebuilt in your regular dev workflow. 2. They had some magical distributed build that "just worked" for me. I never had to dive into the details.

I was working on DevTools so in many cases my changes would touch both browser and renderer. Unit testing was helpful.

steeve · 2023-12-30T08:02:26

This is because you’ve been spoiled by Bazel. As was I.

euos · 2023-12-30T16:06:01

One day I will learn cmake. But not today :)

dilyevsky · 2023-12-30T05:10:18

Bazel is significantly faster on m1 compared to i7 even if it doesn’t try to recompile protobuf compiler code which it’s still attempting to do regularly

white_dragon88 · 2023-12-30T02:19:26

5+ year old i7 are potato and would be a massive time waster today. Build times matter.

euos · 2023-12-30T16:03:40

I don’t notice myself sitting and waiting for a build. I don’t want to waste my time setting up a new workstation so why bother?

ComputerGuru · 2023-12-30T02:36:15

I have a seven year old ThreadRipper Pro and would not significantly benefit from upgrading.

wtallis · 2023-12-30T03:42:06

The Threadripper PRO branding was only introduced 3.5 years ago. The first two generations didn't have any split between workstation parts and enthusiast consumer parts. You must have a first-generation Threadripper, which means it's somewhere between 8 and 16 CPU cores.

If you would not significantly benefit from upgrading, it's only because you already have more CPU performance than you need. Today's CPUs are significantly better than first-generation Zen in performance per clock and raw clock speed, and mainstream consumer desktop platforms can now match the top first-generation Threadripper in CPU core count and total DRAM bandwidth (and soon, DRAM capacity). There's no performance or power metric by which a Threadripper 1950X (not quite 6.5 years old) beats a Ryzen 7950X. And the 7950X also comes in a mobile package that only sacrifices a bit of performance (to fit into fairly chunky "laptops").

ComputerGuru · 2023-12-30T03:55:24

I guess I should clarify: I am a rust and C++ developer blocked on compilation time, but even then, I am not able to justify the cost of upgrading from a 1950X/128GB DDR4 (good guess!) to the 7950X or 3D. It would be faster, but not in a way that would translate to $$$ directly. (Not to mention the inflation in TRx costs since AMD stopped playing catch-up.) performance-per-watt isn’t interesting to me (except for thermals but Noctua has me covered) because I pay real-time costs and it’s not a build farm.

If I had 100% CPU consumption around the clock, I would upgrade in a heart beat. But I’m working interactively in spurts between hitting CPU walls and the spurts don’t justify the upgrade.

If I were to upgrade it would be for the sake of non-work CPU video encoding or to get PCIe 5.0 for faster model loading to GPU VRAM.

washadjeffmad · 2023-12-31T02:11:12

sTR4 workstations are hard to put down! I'll replace mine one day, probably with whatever ASRock Rack Epyc succeeds the ROMED8-2T with PCIe 5.0.

In the meantime, I wanted something more portable, so I put a 13700K and RTX 3090 in a Lian Li A4-H2O case with an eDP side panel for a nice mITX build. It only needs one cable for power, and it's as great for VR as it is a headless host.

drgo · 2023-12-30T01:07:00

To people who are thinking about using AI for data analyses like the one described in the article:

- I think it is much easier to just load the data into R, Stata etc and interrogate the data that way. The commands to do that will be shorter and more precise and most importantly more reproducible.

- the most difficult task in data analysis is understanding the data and the mechanisms that have generated it. For that you will need a causal model of the problem domain. Not sure that AI is capable of building useful causal models unless they were somehow first trained using other data from the domain.

- it is impossible to reasonably interpret the data without reference to that model. I wonder if current AI models are capable of doing that, e.g., can they detect confounding or oversized influence of outliers or interesting effect modifiers.

Perhaps someone who knows more than I do on the state of current technology can provide a better assessment of where we are in this effort

balls187 · 2023-12-30T01:54:25

That is effectively what the GPT4 based AI Assistant is doing.

Except when I did it, it was python and pandas. You can ask it to show you the code it used to do it's analysis.

So you can load the data into R/Python and google "how do I do xyzzzy" and write the code yourself, or use ChatGPT.

drgo · 2023-12-30T02:38:08

so ChatGPT can build a causal model for a problem domain? How does it communicate that (using a DAG?)? It would be important for the data users to understand that model.

ribit · 2023-12-30T08:58:07

Interesting idea, but the quality of data analysis is rather poor IMO and I'm not sure that they are actually learning what they think they are learning. Most importantly, I don't understand why they would see such a dramatic increase of sub 20s build times going from M1 Pro to M2 Pro. The real-world performance delta between the two on code compilation workloads is around 20-25%. It also makes little sense to me that M3 machines have fewer sub-20s builds than M2 machines. Or that M3 Pro with half the cores has more sub-20s builds than M3 Max.

I suspect there might be considerable difference in developer behavior which results in these differences. Such as people with different types of laptops typically working on different things.

And a few random observations after a very cursory reading (I might be missing something):

- Go compiler seems to take little advantage from additional cores

- They are pooling data in ways that makes me fundamentally uncomfortable

- They are not consistent in their comparisons, sometimes they use histograms, sometimes they use binned density plots (with different y axis ranges), it's real unclear what is going on here...

- Macs do not throttle CPU performance on battery. If the builds are really slower on battery (which I am not convinced about btw looking at graphs), it will be because of "low power" setting activated

epolanski · 2023-12-30T13:11:37

> also makes little sense to me that M3 machines have fewer sub-20s builds than M2 machines.

M3s have a smaller memory bandwidth, they are effectively a downgrade for some use cases.

ribit · 2023-12-30T17:00:16

You are not going to saturate a 150GB/s memory interface building some code on a six-core CPU... these CPUs are fast, but not that fast.

inkyoto · 2023-12-31T01:09:24

Oh, yes, you are. The optimising steps, linking and the link-time optimisation (LTO) are very heavily memory bound, especially on large codebases.

The Rust compiler routinely hits the 45-50Gb/sec ballpark of the intra-memory transfer speed on compiling a medium sized project, more if the code base large. Haskell (granted, a fringe yet revealing case) case just as routinely hits the 60-70 Gb/sec memory transfer speed at the compile time, and large to very large C++ codebases add a lot of stress on the memory at the optimisation step. If I am not mistaken, Go is also very memory bound.

Then there comes the linking and particularly the LTO that want all the memory bandwidth they can get to get the job done quickly, and the memory speed becomes a major bottleneck. Loading the entire codebase into memory, in fact, the major optimisation technique used in mold[0] that can vastly benefit from a) faster memory, b) a wider memory bus.

[0] https://github.com/rui314/mold

mcny · 2023-12-29T23:04:16

> All developers work with a fully fledged incident.io environment locally on their laptops: it allows for a This to me is the biggest accomplishment. I've never worked at a company (besides brief time helping out with some startups) where I have been able to run a dev/local instance of the whole company on a single machine.

There's always this thing, or that, or the other that is not accessible. There's always a gotcha.

8n4vidtmkvmk · 2023-12-30T02:00:32

I never couldn't run the damn app locally until my latest job. Drives me bonkers. I don't understand how people aren't more upset and this atrocious devex. Damn college kids don't know what they're missing.

physicles · 2023-12-30T13:20:11

I can’t imagine not having this. We use k3s to run everything locally and it works great. But we (un)fortunately added snowflake in the last year — it solves some very real problems for us, but it’s also a pain to iterate on that stuff.

kamikaz1k · 2023-12-30T02:53:28

I used to work in a company like that, and since leaving it I’ve missed that so much.

People who haven’t lived in that world just cannot understand how much better it is, and will come up with all kinds of cope.

daxfohl · 2023-12-30T17:15:30

We used to have that, but it's hard to support as you scale. The level of effort is somewhat quadratic to company size: linear in the number of services you support and in the number of engineers you have to support. Also divergent use cases come up that don't quite fit, and suddenly the infra team is the bottleneck to feature delivery, and people just start doing their own thing. Once that Pandora's Box is opened, it's essentially impossible to claw your way back.

I've heard of largeish companies that still manage to do this well, but I'd love to learn how.

That said, yeah I agree this is the biggest accomplishment. Getting dev cycles down from hours or days to minutes is more important than getting them down from minutes to 25% fewer minutes.

teaearlgraycold · 2023-12-30T00:31:09

I’m currently doing my best to make this possible with an app I’m building. I had to convince the CEO the M2 Max would come in handy for this (we run object detection models and stable diffusion). So far it’s working out!

lawrjone · 2023-12-29T20:18:32

Author here, thanks for posting!

Lots of stuff in this from profiling Go compilations, building a hot-reloader, using AI to analyse the build dataset, etc.

We concluded that it was worth upgrading the M1s to an M3 Pro (the max didn’t make much of a difference in our tests) but the M2s are pretty close to the M3s, so not (for us) worth upgrading.

Happy to answer any questions if people have them.

aranke · 2023-12-29T22:54:38

Hi,

Thanks for the detailed analysis. I’m wondering if you factored in the cost of engineering time invested in this analysis, and how that affects the payback time (if at all).

Thanks!

lawrjone · 2023-12-30T07:00:21

Author here: this probably took a 2.5 days to put together, all in.

The first day was spent hacking together a new hot reloaded but this also fixed a lot of issues we’d had with the previous loader such as restarting into stale code, which was really harming people’s productivity. That was well worth even several days of effort really!

The second day I was just messing around with OpenAI to figure out how I’d do this analysis. We’re right now building an AI assistant for our actual product so you can ask it “how many times did I get paged last year? How many were out-of-hours? Is my incident workload increasing?” Etc and I wanted an excuse to learn the tech so I could better understand that feature. So for me, well worth investing a day to learn.

Then the article itself took about 4hrs to write up. That’s worth it for us given exposure for our brand and the way it benefits us for hiring/etc.

We trust the team to make good use of their team and allowing people to do this type of work if they think it’s valuable is just an example of that. Assuming I have a £1k/day rate (I do not) we’re still only in for £2.5k, so less than a single MacBook to turn this around.

bee_rider · 2023-12-30T02:52:26

They could also add in the advertising benefit of showing off some fun data on this site :)

smabie · 2023-12-30T00:56:49

But then they'd have to factor in the engineering time invested in the analysis of the analysis?

zarzavat · 2023-12-30T02:15:57

Zeno’s nihilism: nothing is worth doing because the mandatory analysis into whether something is worth doing or not takes infinite time.

hedgehog · 2023-12-30T06:42:06

I'm curious how you came to the conclusion the Max SKUs aren't much faster, the distributions in the charts make them look faster but the text below just says they look the same.

firecall · 2023-12-30T01:33:53

So can we assume the M3 Max offered little benefit because the workloads couldn’t use the cores?

Or the tasks maybe finished so fast that it didn’t make a difference in real world usage?

afro88 · 2023-12-30T00:51:33

Great analysis! Thanks for writing it up and sharing.

Logistical question: did management move some deliverables out of the way to give you room to do this? Or was it extra curricular?

BlueToth · 2023-12-29T22:24:37

Hi, Thanks for the interesting comparison. What I would like to see added would be a build on a 8GB memory machine (if you have one available).

Tarball10 · 2023-12-30T01:35:49

This is only tangentially related, but I'm curious how other companies typically balance their endpoint management and security software with developer productivity.

The company I work for is now running 5+ background services on their developer laptops, both Mac and Windows. Endpoint management, priviledge escalation interception, TLS interception and inspection, anti-malware, and VPN clients.

This combination heavily impacts performance. You can see these services chewing up CPU and I/O performance while doing anything on the machines, and developers have complained about random lockups and hitches.

I understand security is necessary, especially with the increase in things like ransomware and IP theft, but have other companies found better ways to provide this security without impacting developer productivity as much?

andrekandre · 2023-12-30T09:22:15

  > have other companies found better ways to provide this security without impacting developer productivity as much?

only way i've seen is if things get bad, report it to it/support and tell them what folder/files to exclude from inspection so your build temp files and stuff don't clog and slow up everything

suchar · 2023-12-30T18:11:26

Same here, but IMO, if company believes that such software is useful (and they wouldn't be using it if company believed otherwise), then why do they often (always?) include node_modules in exclusion rules? After all, node_modules usually contains a lot of untrusted code/executables

rendaw · 2023-12-29T22:51:34

> People with the M1 laptops are frequently waiting almost 2m for their builds to complete.

I don't see this at all... the peak for all 3 is at right under 20s. The long tail (i.e. infrequently) goes up to 2m, but for all 3. M2 looks slightly better than M1, but it's not clear to me there's an improvement from M2 to M3 at all from this data.

vessenes · 2023-12-29T20:34:11

The upshot: M3 Pro is slightly better than M2 and significantly better than M1 Pro is what I've experienced with running local LLMs on my Macs; currently M3 memory bandwidth options are lower than for M2, and that may be hampering the total performance.

Performance per watt and rendering performance are both better in the M3, but I ultimately decided to wait for an M3 Ultra with more memory bandwidth before upgrading my daily driver M1 Max.

lawrjone · 2023-12-29T20:38:58

This is pretty much aligned with our findings (am the author of this post).

I came away feeling that:

- M1 is a solid baseline

- M2 improves performance by about 60% - M3 Pro is marginal on the M2, more like 10%

- M3 Max (for our use case) didn’t seem that much different on the M3 Pro, though we had less data on this than other models

I suspect Apple saw the M3 Pro as “maintain performance and improve efficiency” which is consistent with the reduction in P-cores from the M2.

The bit I’m interested about is that you say the M3 Pro is only a bit better than the M2 at LLM work, as I’d assumed there were improvements in the AI processing hardware between the M2 and M3. Not that we tested that, but I would’ve guessed it.

vessenes · 2023-12-29T20:51:05

Yeah, agreed. I'll say I do use the M3 Max for Baldur's gate :).

On LLMs, the issue is largely that memory bandwidth: M2 Ultra is 800GB/s, M3 Max is 400GB/s. Inference on larger models are simple math on what's in memory, so the performance is roughly double. Probably perf / watt suffers a little, but when you're trying to chew through 128GB of RAM and do math on all of it, you're generally maxing your thermal budget.

Also, note that it's absolutely incredible how cheap it is to run a model on an M2 Ultra vs an H100 -- Apple's integrated system memory makes a lot possible at much lower price points.

lawrjone · 2023-12-29T20:57:55

Ahh right, I'd seen a few comments about the memory bandwidth when it was posted on LinkedIn, specifically that the M2 was much more powerful.

This makes a load of sense, thanks for explaining.

cameroncairns · 2023-12-29T23:06:07

I've been considering buying a Mac specifically for LLMs, and I've come across a lot of info/misinfo on the topic of bandwidth. I see you are talking about M2 bandwidth issues that you read about on linkedin, so I wanted to expand upon that in case there is any confusion on your part or someone else who is following this comment chain.

M2 Ultra at 800 GB/s is for the mac studio only. So it's not quite apples to apples when comparing against the M3 which is currently only offered for macbooks.

M2 Max has bandwidth at 400 GB/s. This is a better comparison to the current M3 macbook line. I believe it tops out at 96GB of memory.

M3 Max has a bandwidth of either 300 GB/s or 400 GB/s depending on the cpu/gpu you choose. There is a lower line cpu/gpu w/ a max memory size of 96GB, this has a bandwidth of 300 GB/s. There is a top of the line cpu/gpu with a max memory size of 128GB, this has the same bandwidth as the previous M2 chip at 400 GB/s.

The different bandwidths depending on the M3 max configuration chosen has led to a lot of confusion on this topic, and some criticism for the complexity of trade offs for the most recent generation of macbook (number of efficiency/performance cores being another source of criticism).

Sorry if this was already clear to you, just thought it might be helpful to you or others reading the thread who have had similar questions :)

karolist · 2023-12-30T08:30:48

Worth noting that when AnandTech did their initial M1 Max review, they never were able to achieve full 400GB/s memory bandwidth saturation, the max they saw when engaging all CPU/GPU cores was 243GB/s - https://www.anandtech.com/show/17024/apple-m1-max-performanc....

I have not seen the equivalent comparisons with M[2-3] Max.

cameroncairns · 2023-12-30T18:47:50

Interesting! There are anecdotal reports here and there on local llama about real world performance, but yeah I'm just reporting what Apple advertises for those devices on their spec sheet

vessenes · 2023-12-30T01:31:34

All this sounds right!

If money is no object, and you don't need a laptop, and you want a suggestion, then I'd say the M2 Ultra / Studio is the way to go. If money is still no object and you need a laptop, M3 with maxed RAM.

I have a 300GB/s M3 and a 400 GB/s M1 with more RAM, and generally the LLM difference is minimal; the extra RAM is helpful though.

If you want to try some stuff out, and don't anticipate running an LLM more than 10 hours a week, lambda labs or together.ai will save you a lot of money. :)

cameroncairns · 2023-12-30T18:50:17

The tech geek in me really wants to get a studio with an M2 ultra just for the cool factor, but yeah I think cost effectiveness wise it makes more sense to rent something in the cloud for now.

Things are moving so quickly with local llms too it's hard to say what the ideal hardware setup will be 6 months from now, so locking into a platform might not be the best idea.

nightski · 2023-12-30T15:40:26

H100 is kind of a poor comparison. There are much cheaper ways to get to decent memory without that. Such as 2 A6000s.

Aurornis · 2023-12-29T21:26:27

> - M2 improves performance by about 60%

This is the most shocking part of the article for me since the difference between M1 and M2 build times has been more marginal in my experience.

Are you sure the people with M1 and M2 machines were really doing similar work (and builds)? Is there a possibility that the non-random assignment of laptops (employees received M1, M2, or M3 based on when they were hired) is showing up in the results as different cohorts aren’t working on identical problems?

lawrjone · 2023-12-29T21:30:59

The build events track the files that were changed that triggered the build, along with a load of other stats such as free memory, whether docker was running, etc.

I took a selection of builds that were triggered by the same code module (one that frequently changes to provide enough data) and compared models on just that, finding the same results.

This feels as close as you could get for an apples-to-apples comparison, so I'm quite confident these figures are (within statistical bounds of the dataset) correct!

sokoloff · 2023-12-29T21:40:29

> apples-to-apples comparison

No pun intended. :)

db48x · 2023-12-30T11:47:25

This is bad science. You compared the thing you had to the thing you wanted, and found a reason to pick the thing you wanted. Honesty should have compelled you to at least compare against a desktop–class machine, or even a workstation with a Threadripper CPU. Since you know that at least part of your workload is concurrent, and 14 CPUs are better than 10, why not check to see if 16, 32, or 64 is better still? And the linker is memory bound, so it is worth considering not just the quantity of memory but the actual memory bandwidth and latency as well.

minimaul · 2023-12-30T12:41:36

Being Mac only can be an advantage - I’ve been on both sides of trying to maintain & use non-trivial dev environments and the more OSes you bring in for people to work on, the harder it gets.

Bringing in Windows or Linux has a set up cost and a maintenance cost that may exclude it from even being considered.

Edit: plus, Macs are ARM, other options are inevitably x86. So it’s also two CPU architectures to maintain support for, on top of OS specific quirks - and even if you use eg Docker, you still have a lot of OS specific quirks in play :/

smoldesu · 2023-12-30T14:34:01

My biggest issue with Mac-only shops is that almost nobody actually deploys to Mac. The majority of Mac-only firms I've worked at deploy to x86 Linux and develop in a VM on their Macbook (even pre-M1). Unless your business is writing Mac-native apps, MacOS is probably going to be a second-class deployment platform for you.

Even in an ideal scenario where your app already works on ARM, you will be dealing with OS-specific quirks unless your production machine runs MacOS.

minimaul · 2023-12-30T15:58:17

These are fair points, and definitely a rough spot.

Eg at work we use M1/M2 macs and dev on those using docker - so that’s a Linux VM essentially with some nice tooling wrapped around it.

We certainly see differences - mostly around permissions (as docker for Mac doesn’t really enforce any access checks on files on the host), but we also mostly deploy to ARM Linux on AWS.

We went Mac only from a mix of Linux, Windows and Mac as we found the least overall friction there for our developers - Windows, even with WSL, had lots of problems, including performance issues. Linux we had issues finding nice laptops, and more support issues (developers are often not *nix experts!). Mac was a nice middle ground in the end.

db48x · 2023-12-30T16:19:59

> Linux we had issues finding nice laptops

This is the same issue as before. Laptops are shiny so people don’t even bother considering a regular desktop machine. And yet desktops can be so much more powerful simply because they don’t have the thermal and power delivery restrictions that desktops have.

minimaul · 2023-12-30T19:30:10

Laptops have advantages though - for a remote team, they're often a lot more convenient. A lot of people don't have space for a full permanent desk setup for a work desktop on top of their personal use - UK houses aren't huge!

Desktops work if you have an office, but our dev team is entirely remote. But you can't take a desktop into a meeting, or take a desktop on the train to our office for in-person meetings/events.

db48x · 2023-12-31T06:32:29

Those are all bad excuses. And they all have fixes other than compromising on the quality of your computer. Laptops for managers who go to meetings, workstations for engineers who actually accomplish things.

danhau · 2023-12-30T11:59:28

I don‘t think this is bad science at all.

From the article:

> All incident.io developers are given a MacBook which they use for their development work.

Non-MacBook machines are apparently not an option, for whatever reason. Comparing against other machines would be interesting, but irrelevant.

db48x · 2023-12-30T16:14:53

So it’s ok for science to be limited by politics?

p_j_w · 2023-12-30T17:10:32

They’re looking at their particular use case. That may limit the applicability of this to other people or companies, but that doesn’t make it politics.

db48x · 2023-12-30T17:20:44

When your CEO decides that your company will only buy Apple laptops, that is definitely a political football. Your CEO likes shiny things, and is willing to purchase the shiniest of them without regard to effectiveness or cost.

aschla · 2023-12-29T21:12:28

Side note, I like the casual technical writing style used here, with the main points summarized along the way. Easily digestible and I can go back and get the details in the main text at any point if I want.

lawrjone · 2023-12-29T21:17:26

Thank you, really appreciate this!

JonChesterfield · 2023-12-30T07:00:19

If dev machine speed is important, why would you develop on a laptop?

I really like my laptop. Spend a lot of time typing into it. It's limited to a 30W or similar power budget on thermal and battery constraints. Some of that is spent on a network chip which grants access to machines with much higher power and thermal budgets.

Current employer has really scary hardware behind a VPN to run code on. Previous one ran a machine room with lots of servers. Both expected engineer laptops to be mostly thin clients. That seems obviously the right answer to me.

Thus marginally faster dev laptops don't seem very exciting.

solatic · 2023-12-30T07:57:58

> Current employer has really scary hardware behind a VPN to run code on. Previous one ran a machine room with lots of servers. Both expected engineer laptops to be mostly thin clients. That seems obviously the right answer to me.

It's quite expensive to set up, regardless of whether we're talking about on-prem or cloud hardware. Your employer is already going to buy you a laptop; why not try to eke out what's possible from the laptop first?

The typical progression, I would think, is (a) laptop only, (b) compilation times get longer -> invest in a couple build cache servers (e.g. Bazel) to support dozens/hundreds of developers, (c) expand the build cache server installation to provide developer environments as well

balls187 · 2023-12-30T01:26:31

> a chat interface to your ... data Generally, the process includes:

> Exporting your data to a CSV

> Create an ‘Assistant’ with a prompt explaining your purpose, and provide it the CSV file with your data.

Once MSFT build's the aforementioned process into Excel, it's going to be a major game changer.

hk1337 · 2023-12-29T22:15:59

I wonder why they didn't include Linux since the project they're building is Go? Most CI tools, I believe, are going to be Linux. Sure, you can explicitly select macOS in Github CI but Linux seems like it would be the better generic option?

*EDIT* I guess if you needed a macOS specific build with Go you would us macOS but I would have thought you'd use Linux too. Can you build a Go project in Linux and have it run on macOS? I suppose architecture would be an issue building on Linux x86 would not run on macOS Apple Silicon but the reverse is true too a build on Apple Silicon would not work on Linux x86 maybe not even Linux Arm.

xp84 · 2023-12-29T22:39:06

I know nothing about Go, but if it's like other platforms, builds intended for production or staging environments are indeed nearly always for x86_64, but those are done somewhere besides laptops, as part of the CI process. The builds done on the laptops are to run each developer's local instance of their server-side application and its front-end components, That instance is always being updated to whatever is in-progress at the time. Then they check that code in, and eventually it gets built for prod on an Intel, Linux system elsewhere.

varispeed · 2023-12-29T23:03:46

Cross compilation is probably easiest in Go. If I recall you can just give it a different arch parameter and it will produce a working build for any supported platform.

actinium226 · 2023-12-30T00:35:31

Honestly, I found the analysis a little difficult to read. Some of the histograms weren't normalized which made the comparison a bit difficult.

One thing that really stood out to me was:

> People with the M1 laptops are frequently waiting almost 2m for their builds to complete.

And yet looking at the graph, 120s is off the scale on the right side, so that suggests almost literally no one is waiting 2m for a build, and most builds are happening within 40s with a long tail out to 1m50s.

j4yav · 2023-12-30T07:38:33

I think the main point was justifying getting new M3s

brailsafe · 2023-12-29T23:13:12

Fun read, I like how overkill it is. When I was still employed, I was building our django/postgres thing locally in Docker, with 32gb of ram, and it was a wild improvement in terms of feedback loop latency over my shitty 13" intel mbp, and I think it's seriously underappreciated how important it is to keep that pretty low, or as low as is cost effective. Now that I'm not employed and don't hope to be for a while, I do think the greatest bottleneck in my overall productivity is my own discipline since I'm not compiling anything huge or using Docker. The few times I do really notice how slow it is, it's in big I/O or ram operations like indexing, or maybe the occasional xcode build, but it's still low in absolute terms and the lack of some stressful deadline doesn't have me worrying so much about it. That makes me happy in some ways, because I used to feel like I could just throw new hardware at my overall productivity and solve any issues, but I think that's true only for things that are extremely computationally expensive. Normally, I'd just spend the cash, even as a contractor, because it's investment in my tools and that's good, but the up-charge for ram and ssd is so ludicrously high that I have no idea when that upgrade will come, and the refurb older models of M1 and M2 just aren't that much lower. My battery life also sucks, but it's not $5k sucky yet. Also worth joting I'm just a measly frontend developer, but there have been scenarios in which I've been doing frontend inside either a big Docker container or a massive Tomcat java app, and for those I'd probably just go for it.

karolist · 2023-12-30T08:37:12

I don't know where in the world you are, but B&H in US still sells new 16" M1 Max machines with 64GB memory, 2TB SSD for 2499-2599 depending on the current deal. This is around the price of base M3 Pro with 18/512 configuration, I figure you'll still get 5+ years of use with such machine and never worry about storage or memory.

brailsafe · 2023-12-31T02:59:49

Good point, although it would feel odd to spend what would amount to about $3500 after tax in CAD on a 3 y.o laptop, albeit a new 3 y.o laptop. For now I'll just stick it out with my Intel thing, since that $3500 is a more expensive $3500 than ever anyway, until my requirements get more demanding.

kingTug · 2023-12-29T20:40:33

Does anyone have any anecdoctal evidence around the snappiness of VsCode with Apple Silicon? I very begrudgingly switched over from SublimeText this year (after using it as my daily driver for ~10yrs). I have a beefy 2018 MBP but VScode just drags. This is the only thing pushing me to upgrade my machine right now but I'd be bummed if there's still not a significant improvement with an m3 pro.

（评论） (comments)

（评论）
(comments)