(评论)
(comments)

原始链接: https://news.ycombinator.com/item?id=39549486

以下是基于给定材料的一些关键要点: 1. 缓慢而有条不紊地进行,而不是立即进行重构。 在进行任何更改之前,彻底分析代码库并深入了解其工作原理。 2. 专注于建立有效的端到端测试和持续集成自动化。 捕获系统的预期行为并确保它们在实施之前和整个实施过程中持续匹配。 3. 减少需要支持的变化量,因为它显着减少了重构工作的范围和复杂性,并提高了效率、易于维护和适应不断变化的需求。 4. 尽量减少不必要的更改,因为实施更改会增加引入额外复杂性的可能性,从而加剧大型代码库固有的挑战。 5. 使用全面的端到端测试套件来评估代码更改并确保系统的功能持续满足预期。 6. 研究代码库背后的业务逻辑,并建立一个能够存储重要信息的集中存储库,从而提高透明度和问责制。 7. Start by learning to work with the specific programming language, environment, and conventions associated with the particular codebase. 幸运的是,无论使用哪种特定编程语言,这个过程都是相当通用的。 在这些工作中,与同事的有效沟通至关重要,定期提供最新信息有助于促进团队成员之间的协作,从而就持续发展达成共同愿景和协调战略。 强调表达的清晰度,包括对任何不透明的元素提供详细解释,有助于简化整个流程并有助于相互理解和理解。 通过采用一致的反馈机制(例如代码审查),可以更频繁、更深入地进行关键评估,从而减少错误并获得更好的总体结果。 建立明确的错误解决程序可确保任何已识别的问题都能得到有效、迅速的纠正,从而加快进度并提高从技术问题到更广泛的利益相关者满意度等多个方面的满意度。 最后,不断寻求增强流程的方法,通过确定应对持续挑战的创新解决方案,进一步增加正在进行的协作努力的价值,同时培养互利关系,有利于通过与其他参与互补努力的合作企业获得长期利益。

相关文章

原文
Hacker News new | past | comments | ask | show | jobs | submit login
You've just inherited a legacy C++ codebase, now what? (gaultier.github.io)
348 points by broken_broken_ 1 day ago | hide | past | favorite | 335 comments










Some good advice here, and some more...controversial advice here.

After inheriting quite a few giant C++ projects over the years, there are a few obvious big wins to start with:

* Reproducible builds. The sanity you save will be your own. Pro-tip: wrap your build environment with docker (or your favorite packager) so that your tooling and dependencies become both explicit and reproducable. The sanity you save will be your own.

* Get the code to build clean with -Wall. This is for a couple of reasons. a) You'll turn up some amount of bad code/undefined behavior/bugs this way. Fix them and make the warning go away. It's ok to #pragma away some warnings once you've determined you understand what's happening and it's "ok" in your situation. But that should be rare. b) Once the build is clean, you'll get obvious warnings when YOU do something sketchy and you can fix that shit immediately. Again, the sanity you save will be your own.

* Do some early testing with something like valgrind and investigate any read/write errors it turns up. This is an easy win from a bugfix/stability point of view.

* At least initially, keep refactorings localized. If you work on a section and learn what it's doing, it's fine to clean it up and make it better, but rearchitecting the world before you have a good grasp on what's going on globally is just asking for pain and agony.



> wrap your build environment with docker (or your favorite packager) so that your tooling and dependencies become both explicit and reproducable

If you want explicitness and reproducibility please don't reach for Docker. Unless you take a lot of care, you will only get the most watered down version of reproducibility with Docker probably luring you into a false sense of security. E.g. pointing to mutable image tags without integrity hashes and invoking apt-get are things you'll find in most Dockerfiles out there and both leave open a huge surface area for things to go wrong and end up in slightly different states.

And while they are not that easy to pick up, solutions like Bazel and Nix will give you a lot better foundation to stand on.



A nice middle ground is using a tool like Google's Skaffold, which provides "Bazel-like" capabilities for composing Docker images and tagging them based on a number of strategies, including file manifests. In my case, I also use build args to explicitly set versions of external dependencies. I also pull external images and build base images with upgraded versions once, then re-tag them in my private repository, which is an easy-to-implement mechanism for reproducibility.

While I am in a Typescript environment with this setup at the moment, my personal experience that Skaffold with Docker has a lighter implementation and maintenance overhead than Bazel. (You also get the added benefit of easy deployment and automatic rebuilds.)

I quite liked using Bazel in a small Golang monorepo, but I ran into pain when trying to do things like include third-party pre-compiled binaries in the Docker builds, because of the unusual build rules convention. The advantage of Skaffold is it provides a thin build/tag/deploy/verify layer over Docker and other container types. Might be worth a look!

Kudos to the Google team building it! https://skaffold.dev



> If you want explicitness and reproducibility please don't reach for Docker. Unless you take a lot of care, you will only get the most watered down version of reproducibility with Docker probably luring you into a false sense of security. E.g. pointing to mutable image tags without integrity hashes and invoking apt-get are things you'll find in most Dockerfiles out there and both leave open a huge surface area for things to go wrong and end up in slightly different states.

If this is frequently a problem you're doing something wrong, or using such a crappy external library/toolchain that breaks frequently on the same version.

Docker is a way to ensure that the software builds with "the most recent minor version" of some OS/toolchain/libraries.

The reason why you want the most recent version is because of security fixes and bugs.

I agree that you should check integrtiy hashes where appropriate, if you really want to fix versions.



Not everything is all or nothing. 80-90% reproducible builds are often good enough and learning Nix only to get that last 10-20% is not always worth it. And there are ways of pinning all dependencies with docker if you really want to.

I've had issues with Dockerfiles not building anymore due to changes in the package registry, but it was like 1-2 times out of 1000 of times I used docker.



I have docker as part of my reproducibility builds. We recently ran into a problem trying to rebuild some old code - turns out the ssl certificates in our older images has expired and so now the code isn't reproducible anyway. One more reason to agree with the idea that docker shouldn't be used.

Though we use docker for a different reason: we are building on linux, targeting linux. It is really easy to use the host system headers or library instead of the target versions of the same - and 99% of the time you will get away with it as they are the same. Then several years later you upgrade your host linux machine and can't figure out why a build that used to work isn't (and since 99% of the stuff is the same this can be months before you test that one exception and then it crashes). Docker ensures we don't install any host headers, or libraries except what is needed for the compiler.



> If you want explicitness and reproducibility please don't reach for Docker

Is it common in C++ builds to rely on the current O/S libraries instead of say making most dependencies explicit, close to full cross-compile? Do dependencies need to be pulled in using apt-get and not something like maven?



If you care about security and bug fixes, then yes

From what I've seen the "minor version" is fixed. e.g. FROM ubuntu:22.04 and not FROM ubuntu:laatest



Step 0: reproducible builds (like you said)

Step 1: run all tests, mark all the flaky ones.

Step 2: run all tests under sanitizers, mark all the ones that fail.

Step 3: fix all the sanitizer failures.

Step 4: (the other stuff you wrote)



If we're going to visit the circles of hell, let's do it properly:

Step -1: Get it under source control and backed up.

Step -2: Find out if the source code corresponds to the executable. Which of the 7 variants of the source code (if any).

Step -3: Do dark rituals over a weekend with cdparanioa to scrape the source code from the bunch of scratched cd's found in someone's bottom drawer. Bonus point if said person died last week, and other eldritch horrors lurk in that bottom drawer. Build a VM clone of the one machine still capable of compiling it.

Yes, I have scars, why do you ask?



Question: Why does some of the product source code look like it is the output of a decompiler?

Answer: Our office was in the WTC and was destroyed on 9/11. Luckily everyone got out alive, but then we discovered we had no off-site backups of the source code. In order to continue development, we had to retrieve the released binaries from our customers and decompile them to get back source code.



Oh.. Ouch, ouch, ouch. I feel for you. That must have been hell.


I wasn’t there for 9/11, my involvement with that code base started a few years later. But it was clear it was a traumatic memory for colleagues who had been


I see this as a case of: "see, allowing wfh would have saved you there."


From what I was told, WFH and people starting late actually really helped, in that there were only a handful of people physically in the office when it happened. But apparently the server with the source code on it was in it too. I don’t know if they forgot about off-site backups entirely, or if they thought they had them but only discovered they were incomplete or faulty afterwards

I don't know what the practice was at that time, but some years later, people weren't allowed to have the source on their laptops, they had to SSH/RDP/etc in to a hosted development system to work on it, which might explain how losing the office resulted in losing the source code even with people doing WFH



No it wouldn't have - 2001 was 4 years before distributed version control. Work from home might have had current copies of some branches checked out on local machines, but nobody would have had full history or code from release branches on their machine.


Everyone who has been around this game for long enough has scars. At one point in the early 90s I was asked to take over maintenance of a vertical market accounting app that had a few hundred users. It had been written using Lattice C, but at the time was being built with the ultra modern MS C 5.1.

The first time I looked at it, I saw that the make file set the warning level to 0 and redirected the output to NUL. Removing that, I ran nmake and prayed. About 45 minutes later it finished building, evidently successfully. It turned out that having the warnings actually print added 10 minutes to the build.

It was averaging more than one error per line of code at /w3. And it was around 40k lines of code. Not large by the standards of today but huge back then for MSDOS. Peeking, it used K&R c and included no header files. So step 1 for me was to hook up some scaffolding using various tools from mks to make sure as I edited things that the error count didn’t increase.

The biggest thing I learned from this project was to not combine other coding with the warning removal or other cleanup. Makes it much easier to spot when you introduce bugs.



Was there a warning for implicit function declaration and implicit variable type for each variable and call that used those? Or how could there be that many warnings.


Yes, plus a bunch of warnings for unsafe type conversions where the compiler did pointer to int, back to pointer conversions. In the process of cleaning it up I found at least a dozen serious bugs.


There was that time when I had to dump the roms off a 'test' MRI machine because that's the only version of the code they had, then decompiled it, and rewrite it from that.

I think about that a lot now that I'm older and spend a fair bit of time in MRI machines...



Dang man that’s tough, at least you know they work ;)


Oh, we tested those a lot. Fun fact: you can see under the foil in scratch cards with those. Might, um, have something to do with "every card a winner" scratch cards no longer being a thing.


I was assuming it already had unit and system tests with decent coverage. I forgot how bad stuff gets.

Maybe VM clones of various users too, and recordings of their work flows?



I'm always careful to dump the bash history as soon as I get access to a machine involved in a legacy project.


Yep! Learned this one the easy way! I don’t get to say that often, so I’m taking full advantage.


Oooh, smart!


This gave me a good laugh. I too have been here.


Shouldn't it be step -3 to -1?


Of course not. You try to do 0 but that's impossible because you need to do -1 first. So you drop everything, try to do -1 but that's ... It's yak shaving's evil twin!


Some people believe that if you read the C++ standard recreationally, it should be interpreted as a call for help, and intervention is required, putting the subject under 24/7 monitoring and physical restraints.

/s

Step -4: Get the version of windows and the compiler it was last known to compile with.



Step -3.5: Do the service pack and .net framework update dance, wave a dead chicken, and hopefully, by shear luck, install them in the correct order. If not, uninstall and goto -4;

Been there. Done that.



Just a note on legacy tests: Step 0.5: understand the tests. They need to be examined to see if they've rotted or not. Tests passing/failing doesn't really mean code under test works or not. The tests might have been abandoned under previous management and don't accurately reflect how the code is _supposed_ to be working.


I'd put that under 2.5 or 3.5, if not later. You only really need to do it before you start modifying code, and it's a massive effort to understand a new codebase. Better pick the lower-hanging fruit (like corruption bugs) so you can at least stay sane when you run the tests and try to understand them.


The same applies to comments.

I have absolutely inherited codebases where one of the early steps was to make a commit excising every single comment in the code, because so many of them were old, lies, or old lies, that it wasn't worth the risk of a junior developer accidentally thinking they could be relied upon.

(and of course they remained available in history to be potentially buffed up and resurrected, but still, argh)



That's brilliant, but also... sounds like hell? Wouldn't that easily add several months to the timeline?


My project has decent code, source control with history from the beginning (ten years in a few months) and unit tests that were abandoned for years. I've spent at least a couple of weeks, over a year or so, just to remove tests that didn't test current functionality and get the others to work/be green. They ain't fast and only semi stable but they regularly find bugs we've introduced.


I find it helpful to try and intentionally break the code under test.

Sometimes the test still passes, and that is a good sign that something is very wrong!



Look at mister fancy here, having tests in his legacy code base.


Step 0 sounds so easy. Until you realize __time__ exists. Then you take that away, and you find out that some compiler heuristics might not be deterministic.

Then you discover -frandom-seed - and go ballistic when you read "but it should be a different seed for each sourcefile"

Then you figure out your linker likes emitting a timestamp in object files. Then you discover /Brepro (if you're lucky enough to use lld-link.

Then you used to discover that Win7's app compat db expected a "real" timestamp, and a hash just won't do. (Thank God, that's dead now). This is usually the part where you start questioning your life choices.

Then somebody comes to your desk and asks if you can also make partial rebuilds deterministic.

On the upside, step 1 is usually quick, there will be no tests.



I think (well, assumed) what they meant by deterministic builds was merely hermetic builds, which are easier. True determinism is overkill for step 0.


Agreed.

Though in the last 6 years I've seen at least one case where truly deterministic builds mattered:

A performance bug only happens when a malloc() during init was not aligned to 32 bytes, glibc on x86_64 only guaranteed 16 bytes, but depending on what alloc / dealloc happened before it may just land on 32 bytes boundary.

The alloc / dealloc sequence before that point was pretty deterministic, however there were a few strings containing __FILE__. And gitlab runner checked-out codes to a path with random number (or an index? I don't remember) without -ffile-prefix-map or $PWD trick so its length varies.



It is really nice to have determinatistic builds when doing estetic clean ups, to verify that the code does not change, or inspecting changes in the assembly code and limit the scope of change to just the affected code.


Often yes. Sometimes, no. You haven't enjoyed C++ until you get reports of the app intermittently crashing, and your build at the same version just won't.

But yes, if the goal is "slap it all in a container", that's probably good and at least somewhat reproducible. We aren't Python here! ;)



> Often yes. Sometimes, no. You haven't enjoyed C++ until you get reports of the app intermittently crashing, and your build at the same version just won't.

That's okay, it's probably just some bank in a random country that requires some software package to be installed, presumably in the interest of security, which injects a dll into every process on the machine and unsurprisingly has a bug which causes your process to crash at random in only that part of the world.



> some software package to be installed, presumably in the interest of security, which injects a dll into every process on the machine

You don't even have to get that far. Shell extensions (for file open or save dialogs) and printer drivers also introduce arbitrary DLLs to your processes. And some of them are compiled in an old version of IIRC Delphi or Turbo Pascal, which on the DLL startup code unconditionally changes the floating point control word to something which causes unexpected behavior in some framework you're using.

(We ended up wrapping all calls to file open or save or print dialogs with code to save and restore the floating point control word, just in case they had loaded one of these annoying DLLs.)



That's probably reading uninitialized memory. You can get away with that for a VERY long time, until you can't. See the earlier valgrind recommendation.

But that sort of report isn't a deep mystery, it's just a specific class of bug. Given the description, you've got a pretty good idea of what you're looking for.



... until the cause is really and truly a non-deterministic build. Trust me, been there.

For a long-ago example: I worked on a project that had an optimizer that used time-bounded simulated annealing to optimize. No two builds ever the same. It was "great".



That sounds delightful. :0


yeah I don't think OP is talking about byte perfect determinism, they just want CI not to explode. that's the triage goal, byte perfect determinism is not your first priority when stopping the bleeding on a legacy c++ project


frandom-seed is not quite so bad.

> The string can either be a number (decimal, octal or hex) or an arbitrary string (in which case it’s converted to a number by computing CRC32). > The string should be different for every file you compile.

So basically just pass in the project relative path to the file into random-seed and you'll be fine. It's a shame the guidance doesn't explain why the string should be different because that feels like it could be advice that's not rooted in any technical reality.

__time__ isn't actually that bad as it's an anti-pattern and much better for the build system to inject the build time explicitly as an input macro (if your software needs it for UX purposes).

__FILE__ is the more annoying one but can be solved through fmacro-prefix-map.



> Then somebody comes to your desk and asks if you can also make partial rebuilds deterministic.

This is a good guy. Knows what they need, knows you are smart enough to potentially finally slay the dragon, will fight the bureaucracy on your behalf. Asking a hard ask is rarely beneficial for the asker on the failure side. Don't burn yourself out for it though and don't be afraid to ask hard favors from the asker.



Probably insert another Step 1: implement tests Be they simple acceptance tests, integration tests, or even unit tests for some things.


> Get the code to build clean with -Wall.

This is fine, but I would strongly recommend against putting something like -Wall -Werror into production builds. Some of the warnings produced by compilers are opinion based, and new compiler versions may add new warnings, and suddenly the previous "clean" code is no longer accepted. If you must use -Werror, use it in debug builds.



That's a feature!

New warnings added to new compiler versions can identify problems that weren't previously detected. You _want_ those to -Werror when they happen, so you can fix them if they need it.

Changing a compiler version is a task that has to be resourced appropriately. Part of that is dealing with any fallout like this. Randomly updating your compiler is just asking for trouble.



It is certainly not a feature because it make all infrastructure including just regular old checkout-and-build workflows break for historical versions of the code. It’s so annoying to have to checkout an older version and then have to go disable -Wall -Werror everywhere just to get the damn thing to build.

Keep master clean of any warnings, for sure. But don’t put it straight into the build system defaults.



Just updating a compiler could break workflows for historical versions of the code. It is unavoidable. But it is easier with build flags if you use VCS: these flags could be different for different versions.


If you store the compiler version in source control, then you don't have this problem.


This is fixed by the suggestion right before it:

> * Reproducible builds. The sanity you save will be your own. Pro-tip: wrap your build environment with docker (or your favorite packager) so that your tooling and dependencies become both explicit and reproducable. The sanity you save will be your own.

Upgrading compiler versions shouldn't be done out-of-band with your normal PR process.



I agree that this helps, although I still think that in general, the default build should never do -Werror, since people may use other toolchains and it shouldn't surprise-break downstream (I'm pretty sure this is a problem Linux distros struggle with all the time..) If it does it only in your fully reproducible CI, then it should be totally fine, of course.


The scripted, packaged docker with toolchain dependencies and _is_ the build. If someone decides to use a different toolchain, the problems are on them.


Open source projects that insist their docker container is the only way to go are going to be an instant reject from me. It's a total copout to just push a docker container and insist that anyone not using it is on their own.

Docker is too fraught with issues for that, and as anyone can attest, there are few things more frustrating in computing than having to follow down a chain of chasing issues in things only superficially related to what you actually want to do.

The least that can be done is for the project to do its best to not be dependent on specific versions, and explicitly document, in a visible place, the minimum and maximum versions known to work, along with a last changed date.



Yeah that works if you are not dealing with open source. If you are dealing with open source, though, it really won't save you that much trouble, if anything it will just lead to unnecessarily hostile interactions. You're not really obligated to fix any specific issues that people report, but shrugging and saying "Your problem." is just non-productive and harms valuable downstreams like Linux distributions. Especially when a lot of new failures actually do indicate bugs and portability issues.


It doesn't even work outside of open source. I am running a prerelease toolchain almost all the time on my computer. If the project at work turns on -Werror, I immediately turn it off and store away the change. Of course this means that I send in code fixes for things that don't reproduce on other people's machines yet, but I literally never receive pushback for this.


Supporting every Linux distribution and their small differences isn't free, and Linux distributions shipping things you haven't tested directly is also a way for users to get bitten by bugs or bad interactions, which they will then report to you directly anyway so you're responsible for it. It's complicated. It's happened plenty of times where e.g. I've run into an obscure and bad bug caused by a packaging issue, or a downstream library that wasn't tested -- or there's a developer who has to get involved with a specific distro team to solve bugs their users are reporting directly to them but that they can't reproduce or pinpoint, because the distro is different from their own environment. Sometimes these point out serious issues, but other times it can be a huge squeeze to only get a little juice.

For some things the tradeoffs are much less clear, open-source or not e.g. a complex multi-platform GUI application. If you're going to ship a Flatpak to Linux users for example, then the utility of allowing any random build environments is not so clear; users will be running your produced binaries anyway. These are the minority of cases, though. (No, maybe not every user wants a Flatpak, but the developers also need to make decisions that balance many needs, and not everything will be perfect.)

Half of the problem, of course, is C and C++'s lack of consistent build environments/build systems/build tooling, but that's a conversation for another day.

That said, I generally agree with you that if you want to be a Good Citizen in the general realm of open-source C and C++ code, you should not use -Werror by default, and you should try (to whatever reasonable extent) to allow and support dependencies your users have. And try to support sanitizers, custom CFLAGS/CXXFLAGS, allow PREFIX and DESTDIR installation options, obey the FHS, etc etc. A lot of things have consolidated in the Linux world over the years, so this isn't as bad as it used to be -- and sometimes really does find legitimate issues in your code, or even issues in other projects.



Again, you don't have to fix bugs that are reported, but treating it as invalid to use any compiler versions except for the exact ones that you use is just counterproductive.

The "utility" of allowing "any random build environment" is that those random build environments are the ones that exist on your user's computers, and absent a particularly good reason why it shouldn't work (like, your compiler is too old, or literally broken,) for the most part it should, and usually, it's not even that hard to make it work. Adopting practices like defaulting -Werror -Wall on and closing bugs as WONTFIX INVALID because it's not any of the blessed toolchains gains you... not sure. I guess piece of mind from having less open issues and one less flag in your CI? But it is sure to be very annoying to users who have fairly standard setups and are trying to build your software; it's pretty standard behavior to report your build failures upstream, because again, usually it does actually signal something wrong somewhere.

Developers are free to do whatever they want when releasing open source code. That doesn't mean that what they are doing is good or makes any sense. There are plenty of perfectly legal things that are utterly stupid to do, like that utterly bizarre spat between Home Assistant and NixOS.



C++ is super annoying in this way. Many other languages (e.g Rust) only have one compiler and good portability out of the box which completely avoids this problem. And other ecosystems that do have multiple implementation (e.g. JavaScript) seem to have much better compatibility/interop such that it's not typically a problem you have to spend much if any time on in practice.


I'm curious what sort of CPUs and OSes do those languages run on. C++ runs on all sorts of obscure real time OSes, all the standard mainstream ones as well as on embedded equipment and various CPUs, but a lot of that is possible because of the variety of compilers.


I’ve had rust projects with strict clippy rules break when rustc is upgraded.


I would say it's still worth having -Werror for some "official" CI build even if it is disabled by default.


I would do wall wextra and werror. Again mostly for my own sanity. But I'd wait to add werror until they were all fixed so regression testing would continue as the warnings got fixed. Cpp_check and clang tidy would also eventually halt the pipeline. And *san on the tests as compiled in both debug and O3 with a couple compilers.


I think this depends on a bunch of stuff.

- Who are the consumers of the source code, i.e. who will ever check it out and build it? Sometimes, it's just one person. Sometimes, it's a team of engineers. In that case, -W -Werror is fine.

- How does a warning being reported make the engineers on the team feel? If the answer is, "Hold my beer for five minutes while I commit a fix", then -W -Werror might be the right call. I've been on projects like that and some of them had nontrivial source code consumers.

- How easy is it to hack the build system? Some projects have wonderfully laid out build systems. If that's the case and -W -Werror is the default, then it's not hard to go in there and change the default, if the -Werror creates problems.

- Does the project have a facility (in the build system) and policy (as a matter of process) to just simply add -Wno-blah-blah as the immediate fix for any new warning that arises? I've seen that, too.

(I'm using -Werror in some parts of a personal project. If you're a solo maintainer of a codebase that can be built that way, then it's worth it - IMO much lower cognitive load to never have non-error warnings. The choice of what to do when the compiler complains is a more straightforward choice.)



Changing other dependencies can also cause the build to break. The best thing to do is to use the dependencies the project specifies.


Technically changing literally anything, including the processor microarchitecture that the developer originally tested the code on, could easily cause a real-world breakage. That doesn't mean it should, though.

Most libraries not written by Google have some kind of backwards compatibility policy. This is for good reasons. For example, if Debian updates libpng because there's a new RCE, it's ideal if they can update every package to the same new version of libpng all at once. If we go to the extreme of "exact dependencies for every package", then this would actually mean that you have to update every dependent package to a new release that has the new version of libpng, all at the same time, across all supported versions of the distribution. Not to mention, imagine the number of duplicate libraries. Many Linux distros, including Debian, have adopted a policy of only having one version of any given library across the whole repo. As far as I understand, that even includes banning statically linked copies, requiring potentially invasive patching to make sure that downstream packages use the dynamically linked system version. And trust me, if they want to do this, they *will* do this. If they can do it for Chromium, they sure as hell can do it for literally any package.

There's a balance, of course. If a distro does invasive patching and it is problematic, I think most people will be reasonable about it and accept that they need to report the issue to their distribution instead. Distros generally do accept bugs for the packages that they manage, and honestly for most packages, by the time a bug gets to you, there is a pretty reasonable chance that it's actually a valid issue, so throwing away the issue simply because it came from someone running an "unofficial" build seems really counterproductive and definitely not in the spirit of open source.

Reproducibility is good for many reasons. I do not feel it is a good excuse to just throw away potentially valid bug reports though. It's not that maintainers are under any obligation to actually act on bug reports, or for that matter, even accept them at all in the first place, but if you do accept bugs, I think that "this is broken in new version of Clang" is a very good and useful bug report that likely signals a problem.



>For example, if Debian updates libpng because there's a new RCE, it's ideal if they can update every package to the same new version of libpng all at once.

It Debian is upgrading a dependency instead of a developer, then Debian should be ready to fix any bugs they introduce.

>then this would actually mean that you have to update every dependent package to a new release that has the new version of libpng, all at the same time, across all supported versions of the distribution

This is already how it works. All vulnerable programs make an update and try to hold off in releasing it until near an embargo date. You don't have to literally update them all at the same time. It's okay of some are updated at different times than others.

>Not to mention, imagine the number of duplicate libraries.

Duplicate libraries are not an issue.

>Many Linux distros, including Debian, have adopted a policy of only having one version of any given library across the whole repo.

This is a ridiculous policy to me as you are forcing programs to use dependencies they were not designed for. This is something that should be avoided as much as possible.

>by the time a bug gets to you, there is a pretty reasonable chance that it's actually a valid issue

That doesn't mean there isn't damage done. There are many people who consider kdenlive an unstable program that constantly crashes because of distros shipping it with the incorrect dependencies. This creates reputational damage.



> It Debian is upgrading a dependency instead of a developer, then Debian should be ready to fix any bugs they introduce.

That's what the Debian Bug Tracking System is for. However, if the package is actually broken, and it's because e.g. it uses the dependency improperly and broke because the update broke a bad assumption, then it would ideally be reported upstream.

> This is already how it works. All vulnerable programs make an update and try to hold off in releasing it until near an embargo date. You don't have to literally update them all at the same time. It's okay of some are updated at different times than others.

That's not how it works in the vast majority of Linux distributions, for many reasons, such as the common rule of having only one version, or the fact that Debian probably does not want to update Blender to a new major version because libpng bumped. That would just turn all supported branches of Debian effectively into a rolling release distro.

> Duplicate libraries are not an issue.

In your opinion, anyway. I don't really think that there's one way of thinking about this, but duplicate libraries certainly are an issue, whether you choose to address them or not.

> This is a ridiculous policy to me as you are forcing programs to use dependencies they were not designed for. This is something that should be avoided as much as possible.

Honestly, this whole tangent is pointless. Distributions like Debian have been operating like this for like 20+ years. It's dramatically too late to argue about it now, but if you're going to, this is not exactly the strongest argument.

Based on this logic, effectively programs are apparently usually designed for exactly one specific code snapshot in time of each of its dependencies.

So let's say I want to depend on two libraries, and both of them eventually depend on two different but compatible versions of a library, and only one of them can be loaded into the process space. Is this a made-up problem? No, this exact thing happens constantly, for example with libwayland.

Of course you can just pick any newer version of libwayland and it works absolutely perfectly fine, because that's why we have shared libraries and semver to begin with. We solved this problem absolutely eons ago. The solution isn't perfect, but it's not a shocking new thing, it's been the status quo for as long as I've been using Linux!

> That doesn't mean there isn't damage done. There are many people who consider kdenlive an unstable program that constantly crashes because of distros shipping it with the incorrect dependencies. This creates reputational damage.

If you want your software to work better on Linux distributions, you could always decide to take supporting them more seriously. If your program is segfaulting because of slightly different library versions, this is a serious problem. Note that Chromium is a vastly larger piece of software than Kdenlive, packaged downstream by many Linux distributions using this very same policy, and yet it is quite stable.

For particularly complex and large programs, at some point it becomes a matter of, OK, it's literally just going to crash sometimes, even if distributions don't package unintended versions of packages, how do we make it better? There are tons of avenues for this, like improving crash recovery, introducing fault isolation, and simply, being more defensive when calling into third party libraries in the first place (e.g. against unexpected output.)

Maintainers, of course, are free to complain about this situation, mark bugs as WONTFIX INVALID, whatever they want really, but it won't fix their problem. If you don't want downstreams, then fine: don't release open source code. If you don't want people to build your software outside of your exact specification because it might damage its reputation, then simply do not release code whose license is literally for the primary purpose of making what Linux distributions do possible. You of course give up access to copyleft code, and that's intended. That's the system working as intended.

I believe that ultimately releasing open source code does indeed not obligate you as a maintainer to do anything at all. You can do all manner of things, foul or otherwise, as you please. However, note that this relationship is mutual. When you release open source code, you relinquish yourself of liability and warranty, but you grant everyone else the right to modify, use and share that code under the terms of the license. Nowhere in the license does it say you can't modify it in specific ways that might damage your program's reputation, or even yours.



>That's what the Debian Bug Tracking System is for.

Software should be extensively tested and code review should be done before it gets shipped to users. Most users don't know about the Debian Bug Tracking system, but they do know about upstream.

>Honestly, this whole tangent is pointless. Distributions like Debian have been operating like this for like 20+ years. It's dramatically too late to argue about it now, but if you're going to, this is not exactly the strongest argument.

It's not too late as evidence by the growth of solutions like appimage and flatpak which allows developers to avoid this.

>So let's say I want to depend on two libraries, and both of them eventually depend on two different but compatible versions of a library, and only one of them can be loaded into the process space. Is this a made-up problem? No, this exact thing happens constantly, for example with libwayland.

Multiple versions of a library can be loaded into the same address space. Developers can choose to have their libraries support a range of versions.

>that's why we have shared libraries and semver to begin with

Hyrum's Law. Semver doesn't prevent breakages on minor bumps.



> Software should be extensively tested and code review should be done before it gets shipped to users.

That's why distributions have multiple branches. Debian Unstable packages get promoted to Debian Testing, which get promoted to a stable Debian release. Distributions do bug tracking and testing.

> Most users don't know about the Debian Bug Tracking system, but they do know about upstream.

There are over 80,000 bugs in the Debian bug tracker. There are over 144,000 bugs in the Ubuntu bug tracker. It would suffice to say that a lot of users indeed know about upstream bug trackers.

I am not blaming anyone who did not know this. It's fully understandable. (And if you ask your users to please go report bugs to their distribution, I think most distributions will absolutely not blame you or get mad at you. I've seen it happen plenty of times.) But just FYI, this is literally one of the main reasons distributions exist in the first place. Most people do not want to be in charge of release engineering for an entire system's worth of packages. All distributions, Debian, Ubuntu, Arch, NixOS, etc. wind up needing THOUSANDS of at least temporarily downstream patches to make a system usable, because the programs and libraries in isolation are not designed for any specific distribution. Like, many of them don't have an exact build environment or runtime environment.

Flatpak solves this, right? Well yes, but actually no. When you target Flatpak, you pick a runtime. You don't get to decide the version of every library in the runtime unless you actually build your own runtime from scratch, which is actually ill-advised in most cases, since it's essentially just making a Linux distribution. And yeah. That's the thing about those Flatpak runtimes. They're effectively, Linux distributions!

So it's nice that Flatpak provides reproducibility, but it's absolutely the same concept as just testing your program on a distro's stable branch. Stable branches pretty much only apply security updates, so while it's not bit-for-bit reproducible, it's not very different in practice; Ubuntu Pro will flat out just default to automatically applying security updates for you, because the risk is essentially nil.

> It's not too late as evidence by the growth of solutions like appimage and flatpak which allows developers to avoid this.

That's not what AppImage is for, AppImage is just meant to bring portable binaries to Linux. It is about developers being able to package their application into a single file, and then users being able to use that on whatever distribution they want. Flatpak is the same.

AppImage and Flatpak don't replace Linux distribution packaging, mainly because they literally can not. For one thing, apps still have interdependencies even if you containerize them. For another, neither AppImage nor Flatpak solve the problem of providing the base operating system for which they run under, both are pretty squarely aimed at providing a distribution platform specifically for applications the user would install. The distribution inevitably still has to do a lot of packaging of C and C++ projects no matter what happens.

I do not find AppImage or Flatpak to be bad developments, but they are not in the business of replacing distribution packaging. What it's doing instead is introducing multiple tiers of packaging. However, for now, both distribution methods are limited and not going to be desirable in all cases. A good example is something like OBS plugins. I'm sure Flatpak either has or will provide solutions for plugins, but today, plugins are very awkward for containerized applications.

> Multiple versions of a library can be loaded into the same address space. Developers can choose to have their libraries support a range of versions.

Sorry, but this is not necessarily correct. Some libraries can be loaded into the address space multiple times, however, this is not often the case for libraries that are not reentrant. For example, if your library has internal state that maintains a global connection pool, passing handles from one instance of the library to the other will not work. I use libwayland as an example because this is exactly what you do when you want to initialize a graphics context on a Wayland surface!

With static linking, this is complicated too. Your program only has one symbol table. If you try to statically link e.g. multiple versions of SDL, you will quickly find that the two versions will in fact conflict.

Dynamic linking makes it better, right? Well, not easily. We're talking about Linux, so we're talking about ELF platforms. The funny thing about ELF platforms is that the way the linker works, there is a global symbol table and the default behavior you get is that symbols are resolved globally and libraries load in a certain order. This behavior is good in some cases as it is how libpthreads replaces libc functionality to be thread-safe, in addition to implementing the pthreads APIs. However it's bad if you want multiple versions, as instead you will get mostly one version of a library. In some catastrophic cases, like having both GTK+2 and GTK3 in the same address space, it will just crash as you call a GTK+2 symbol that tries to access other symbols and winds up hitting a GTK3 symbol instead of what it expected. You CAN resolve this, but that's the most hilarious part: The only obvious way to fix this, to my knowledge, is to compile your dependencies with different flags, namely -Bsymbolic (iirc), and it may or may not even compile with these settings; they're likely to be unsupported by your dependencies, ironically. (Though maybe they would accept bug reports about it.) The only other way to do this that I am aware of is to replace the shared library calls with dlopen with RTLD_LOCAL. Neither of these options are ideal though, because they require invasive changes: in the former, in your dependencies, in the latter, in your own program. I could be missing something obvious, but this is my understanding!

> Hyrum's Law. Semver doesn't prevent breakages on minor bumps.

Hyrum's law describes buggy code that either accidentally or intentionally violates contracts to depend on implementation details. Thankfully, people will, for free, report these bugs to you. It's legitimately a service, because chances are you will have to deal with these problems eventually, and "as soon as possible" is a great time.

Just leaving your dependencies out of date and not testing against newer versions ever will lead to ossification, especially if you continue to build more code on top of other flawed code.

Hyrum's law does not state that it is good that people depend on implementation details. It just states that people will. Also, it's not really true in practice, in the sense that not all implementation details will actually wind up being depended on. It's true in spherical cow land, but taking it to its "theoretical" extreme implies infinite time and infinite users. In the real world, libraries like SDL2 make potentially breaking changes all the time that never break anything. But even when people do experience breakages as a result of a change, sometimes it's good. Sometimes these breakages reveal actual bugs that were causing silent problems before they turned into loud problems. This is especially true for memory issues, but it's even true for other issues. For example, a change to the Go programming language recently fixed a lot of accidental bugs and broke, as far as anyone can tell, absolutely nobody. But it did lead to "breakages" anyways, in the form of code that used to be accidentally not actually doing the work it was intended to do, and now it is, and it turns out that code was broken the whole time. (The specific change is the semantics change to for loop binding, and I believe the example was a test suite that accidentally wasn't running most of the tests.)

Hyrum's law also describes a phenomena that happens to libraries being depended on. For you as a user, you should want to avoid invoking Hyrum's law because it makes your life harder. And of course, statistically, even if you treat it as fact, the odds that your software will break due to an unintended API breakage is relatively low; it's just higher that across an entire distribution's worth of software something will go wrong. But for your libraries, they actually know that this problem exists and do their best to make it hard to rely on things outside the contract. Good C libraries use opaque pointers and carefully constrain the input domain on each of their APIs to try to expose as little unintended API surface area as humanly possible. This is a good thing, because again, Hyrum's law is an undesirable consequence!



IMO there is absolutely no reason to enable warnings in CI without -Werror. Nobody reads the logs of a successful build.

If some warnings are flaky then disable them specifically. In my experience most warnings in -Wall are OK and you can suppress the rare false positives in code. Don't suppress without a comment.

edit:

Having said that there are entirely valid reasons to not have -Werror outside of CI. It should be absolutely disabled by default if you distribute source.



This is the sort of situation where I'll consider progressive testing initially - i.e. write out the existing warnings to a file that you commit, add a test that fails if you get any that aren't in the file.

As you fix the inherited ones you can regenerate the file, hopefully smaller each time.

"If I don't have time to fix all of this -now-, I can at least make sure it can't get any -worse- in the mean time" is a very useful approach when automating the 'make sure' is something quick enough that you can find time for that.



-Wall and -Werror should be running on all developer machines and your CI machines.

If you are delivering source code to someone else (or getting source code from someone else that you build but do not otherwise work on then you should ensure warnings are disabled for those builds. However all developers and your CI system still needs to run with all warnings possible.

The days when compilers put in warnings that were of questionable value are mostly gone. Today if you see a warning from your compiler it is almost always right.



The compiler and toolchain is a dependency like any other. Upgrading to a new compiler version is an engineering task just like upgrading a library version. It must be managed as such. If this leads to new errors, then this becomes part of the upgrade management. Likewise, since the code generator and optimizers have changed, this upgrade must be tested like any other new feature or fix. Create an appropriate topic/feature branch, and dig in.


Step 0: CI on every commit to build & run all tests / enforce that you never regress once you've finished a step.


> wrap your build environment with docker

For microservices it is fine, but you can't always deploy everything else with docker, especially for people who want to use your app inside a docker. Docker-in-docker is a situation that should never happen.

Containers are nice but they're a horrible way to pretend the problem doesn't exist.

Bundle all the dependencies and make sure it doesn't depend on 5 billion things being in /usr/lib and having the correct versions.



Not the OP, but I don’t think they meant that the build output is in a container. They meant that the thing you use to compile the code is in docker (and you just copy out the result). That would help ensure consistency of builds without having any effect on downstream users.


Exactly. The compiler and what ever dependencies you need _to build_ are bundled into a docker so that you don't need to worry about whatever random tools/libraries your coworkers have installed in their local environment.


I would not call that 'controversial'. In the internet days people call this behavior trolling for a reason. The punchline about rewriting code in different language gives an easy hint at where this all going.

PS. I have been in the shoes of inheriting old projects before. And I hope i left them in better state than they were before.



Great advice. Almost all of it applies to any programming language.


I'd swap 2 and 3. Getting CI, linting, auto-formatting, etc. going is a higher priority than tearing things out. Why? Because you don't know what to tear out yet or even the consequence of tearing them out. Linting (and other static analysis tools) also give you a lot of insight into where the program needs work.

Things that get flagged by a static analysis tool (today) will often be areas where you can tear out entire functions and maybe even classes and files because they'll be a re-creation of STL concepts. Like homegrown iterator libraries (with subtle problems) that can be replaced with the STL algorithms library, or homegrown smart pointers that can just be replaced with actual smart pointers, or replacing the C string functions with C++'s on string class (and related classes) and functions/methods.

But you won't see that as easily until you start scanning the code. And you won't be able to evaluate the consequences until you push towards more rapid test builds (at least) if not deployment.



Yeah, I've done a fair bit of agency work dropping in to rescue code bases, and the first thing I do is run unit tests and check coverage. I add basic smoke tests anywhere they're missing. This actually speeds me up, rather than slowing me down, because once I have reasonably good coverage I can move dramatically faster when refactoring. It's a small investment that pays off.


On the flip side, auto-formatting will trash your version history and impede analysis of "when and why was this line added".


I'm not hardcore on auto-formatters, but I think their impact on code history is negligible in the case of every legacy system I've worked on. The code history just isn't there. These aren't projects that used git until recently (if at all). Before that they used something else, but when they transitioned they didn't preserve the history. And that's if they used any version control system. I've tried to help teams whose idea of version control was emailing someone (they termed them "QA/CM") to make a read-only backup of the source directory every few months (usually at a critical review period in the project, so a lot of code was changed between these snapshots).

That said, sure, skip them if you're worried about the history getting messed up or use them more selectively.



>I think their impact on code history is negligible in the case of every legacy system I've worked on. The code history just isn't there.

Not sure if I agree here or not - whilst yes, the history isn't there, if it's a small enough team you'll have a good guess at who wrote it.

Definitely found I've learnt the style of colleages so know who to ask just from the code outline.



Legacy systems that you inherit don't have people coming with them very often. That's part of the context of this. You often don't have people to trace it back to or at least not the people who actually wrote it (maybe someone who worked with them before they got laid off a decade ago), and reformatting the code is not going to make it any harder to get answers from people who aren't there.


I've been in situations where even without access to the people knowing which of them wrote something gives me a better idea of how to backwards infer what (and of course sadly occasionally 'if') they were thinking while writing the code.

Then again, I think most of the tells for that for me are around the sort of structure that would survive reformatting anyway.

(and, y'know, legacy stuff, everything's a bloody trade-off)



SVN was a thing by the mid-2000's, and history from that is easy to preserve in git. Just how old are the sourcebases in question? (Not to shoot the messenger; just like, wow.)

edit:typo



The first large C++ project I worked on in mid-1990s was basically preserving a bunch of archived copies of the source tree. CVS was a thing but not on Windows, and SourceSafe was creating more problems than it been solving.


I kept regular tarballs of a project that used SourceSafe right near the start of my career, and found I was more likely to be able to find an intact copy of the right thing to diff against from my tarballs.

I think after a year or so I realised that even bothering to -try- to use SourceSafe was largely silly, got permission to stop, and installed a CVS server on a dev box for my own use.

(yes I know the VCS server shouldn't really be on the dev box I could potentially trash, I didn't have another machine handy and it was still a vast improvement)



Some of these systems dated back to the 1970s. The worst offenders were from the 1980s and 1990s though.

It's all about the team or organization and their laziness or non-laziness.



I maintain a C++ codebase that was originally written in 1996, and is mission critical for my organization. Originally maintained in Visual Sourcesafe, then in TFS source control, and now git. Some parts of it were rewritten (several times) in C#, but the core is still C++.

I was very worried when we transitioned to git that history will not be preserved and tried to preserve it, but it proved too much hassle so I dropped it.

In fact that proved not to be a problem. Well, not a problem for me, since I remember all the history of the code and all the half forgotten half baked features and why they are there. But if I'm gone then yes, it's going to be a problem. It's in a dire need for a rewrite, but this has been postponed again and again.



I've had issues doing decent copies from SVN to GIT. They both have different ideas about user identity, and how fragmented it can be.


I looked at a C++ codebase from 1997 at a previous job - I don't know much about the history but comments in one of the old files tracked dates and changes to 2001. Not sure what happened after that but in 2017 someone copy-pasted the project from TFS to git and obliterated the prior history.


I've heard a lot of stories about mid-90s codebases for sure


You can ignore commits from git blame by adding them to a .gitattributes file.

This is assuming Git of course, which is not a given at all for the average legacy c++ codebase.



Good to know. Thanks for the tip!


You can instruct git to ignore specific commits for blame and diff commands.

See "git blame ignore revs file".

Intended use is exactly to ignore bulk changes like auto formatting.



+1

  man git-blame
  git help blame
https://git-scm.com/docs/git-blame


This is another reason why you should track important information in comments alongside the code instead of trusting VCS to preserve it in logs/commit messages, and to reject weird code missing comments from being merged.

Not saying that fixes decades of cruft because you shouldn't change files without good reason and non-white space formatting is not a good reason, but I'm mentioning it because I've seen people naively belief bullshit like "code is self explanatory" and "the reason is in the commit message"

Just comment your code folks, this becomes less of a problem



How does reformatting trash the history? It's one extra commit..

I guess if it splits or combines lines that could cause some noise if you really want the history of a single line... But that happens all the time, and I don't see how it would really prevent understanding the history. You can always do a blame on a range of lines.

Maybe I'm missing something though, genuinely curious for a concrete example where reformatting makes it hard to understand history!



If you ask the IDE to show blame info next to each line, then a lot of lines will be from the big reformatting. If course you can dig in and retrieve the history still, but it's an extras step then. Btw, it seems that at least Git has a way to make `git blame` avoid considering certain commits (.Git attributes). Maybe that works in IDEs too!


On per file level it's just 1 commit. It's not really a big deal


clang-format can be applied to new changes only, for this very reason.

Adding it will remove white space nitpicking from code review, even if it isn't perfect.



I believe you can configure `git blame` to skip a specific commit. But in my experience it doesn't matter anyway for two reasons:

1. You're going to reformat it eventually anyway. You're just delaying things. The best time to plant a tree, etc.

2. If it's an old codebase and you're trying to understand some bit of code you're almost always going to have to walk through about 5 commits to get to the original one anyway. One extra formatting commit doesn't really make any difference.



I would absolutely not recommend auto-formatting a legacy codebase. In my experience large C++ projects tend to have not only code generation scripts (python/perl/whatever) but also scripts that parse the code (usually to gather data for code generation). Auto formatting might break that. I have even seen some really cursed projects where the _users_ parsed public header files with rather fragile scripts.


I was listing the items in the original article's #3 and saying I'd move them up to #2 before I'd go about excising portions of the project, the original #2. I still stand by that. But you can read my other comment where I don't really defend auto-formatting to see that I don't care either way. I made it about four hours ago so maybe you missed it if you didn't refresh the page in the last few hours.


CI is different from the others, here! At minimum, building a "happy path(s)" test harness that can run with replicable results, and will run on every one of your commits, is a first step, and also helps to understand the codebase.

And you're jumping around - and you'll have to! - odds are you'll have a bunch of things changed locally, and might accidentally create a commit that doesn't separate out one concern from another. CI will be a godsend at that point.



Fair point!


Nit: The post scopes "tearing things out" to dead code as guided by compiler warnings and unsupported architectures.

If going the route, I'd recommend commenting out the lines rather than removing them outright to simplify the diffs at least until you're ready to squash and merge the branch.



Better to use `#if` or `#ifdef` to prevent compilation. C & C++ don't support nested comments, so you can end up with existing comments in the code ending the comment block.


I think `#if` and `#ifdef` are not good ideas because they prevent the compiler from seeing them in the first place. A better solution is just `if (false)` which is nestable, and the code is still checked by the compiler so it won't bit rot.


Not mentioned were code comprehension tools / techniques:

I used to use a tool called Source Navigator (written in Tcl/tk!) that was great at indexing code bases. You could then check the Call Hierarchy of the current method, for example, then use that to make UML Sequence Diagrams. A similar one called Source Insight shown below [1].

And oh, notes. Writing as if you're teaching someone is key.

Over the years, I got quite good at comprehending code, even code written by an entire team over years. For a brief period, I was the only person actively supporting and developing an algorithmic trading code base in Java that traded ~$200m per day on 4 or 5 exchanges. I had 35 MB of documentation on that, lol. Loved the responsibility (ignoring the key man risk :|). Honestly, there's a lot of overengineering and redundancy in most large code bases.

[1] References in "Source Insight" https://d4.alternativeto.net/6S4rr6_0rutCUWnpHNhVq7HMs8GTBs6...



> I used to use a tool called Source Navigator

I can't believe I'm finding someone in the wild that also has used Source Navigator.

My university forced this artifact on me in the computer architecture course because it has some arcane feature set + support for an ARM emulator that isn't found elsewhere. We used it for bare metal ARM assembly programming



>worry not, by adding std::cmake to the standard library and you’ll see how it’s absolutely a game changer

I'm pretty sure my stomach did somersaults on that.

But as for the advice:

>Get out the chainsaw and rip out everything that’s not absolutely required to provide the features your company/open source project is advertising and selling

I hear you, but this is incredibly dangerous. Might as well take that chainsaw to yourself if you want to try this.

It's dangerous for multiple reasons. Mainly it's a case of Chesterton's fence. Unless you fully understand why X was in the software and fully understand how the current version of the software is used, you cannot remove it. A worst case scenario would be that maybe a month or so later you make a release and the users find out an important feature is subtly broken. You'll spend days trying to track down exactly how it broke.

>Make the project enter the 21st century by adding CI, linters, fuzzing, auto-formatting, etc

It's a nice idea, but it's hard to do. One person is using VIM, another is using emacs, another is using QTCreator, another primarily edits in VSCode.. Trying to get everyone on the same page about all this is very, very hard.

If it's an optional step that requires that they install something new (like commit hook) it's just not going to happen.

Linters also won't do you any good when you open the project and 2000+ warnings appear.



> It's a nice idea, but it's hard to do. One person is using VIM...

The things the author listed there are commonly not IDE integrated. I've never seen a C++ development environment where cpplint/clang-tidy and fuzzers are IDE integrated, they're too slow to run automatically on keystrokes. Auto-formatting is the only one that is sometimes integrated. All of this stuff you can do from the command line without caring about each user's chosen development environment. You should definitely at least try rather than giving up before you start just because you have two different text editors in use. This is C++; if your team won't install any tools, you're gonna have a bad time. Consider containerizing the tools so it's easier.



> I've never seen a C++ development environment where cpplint/clang-tidy and fuzzers are IDE integrated

CLion from JetBrains has clang-tidy integrated (real-time).



I assume it's clangd? It can be used from vim, vscode, ... etc as well and get a uniform IDE diagnostic experience across text editors.


In Emacs I have clangd and clang-tidy running on each key stroke!

The project size is probably a lot smaller than what most people are working on though, and I have a fast CPU and NVME disk, but it's definitely possible to do!

I'm not sure about the fuzzer part though.



Clangd will happily run clang-tidy as part of completion/compile-as-you-type/refactor/auto independent.

On the editor/IDE of your choose.

I wouldn't call it fast, but it is quite usable.



>> It's a nice idea, but it's hard to do. One person is using VIM, another is using emacs, another is using QTCreator, another primarily edits in VSCode.. Trying to get everyone on the same page about all this is very, very hard.

This is what's wrong with our industry, and it's no longer an acceptable answer. We're supposed to be fucking professional, and if a job needs to build a tool chain from the IDE up we need to learn to use it and live with it.

Built on my machine, with my IDE, the way I like it and it works is not software. It's arts and fucking crafts.



> every single carpenter in the world should use the exact same make and model of saw, for, uh, professionalism reasons


Picture the crew that shows up to stick frame your house.

First guy: hand saw and impact driver... cut and screw Second guy: Power Saw, and hammer. Cut and nail. Third guy: Safety glasses and a Nail gun. Forth guy shows up: compressor, asks where the power is (none) and if he can use some tools.

It would not work. You dont build a CNC production line for parts with every CNC being unique. We dont let devs pick what their production server OS is, we dont let them choose random languages. Tooling matters.



One person's saw literally triple checks the measurements before cutting, minimizes wastage, runs 3x faster, and is built by a company specializing in making saws.

The other saw was hand forged in a basement by the user, breaks every other day, and has a totally different blade width, and can only be used by the owner.



Actually yes they should


If you're saying everyone should agree on the same IDE and personal development toolset, I disagree, sort of.

The GP was suggesting the effort to add CI, linters, fuzzing, auto-formatting, etc was too hard. If that can be abandoned entirely, perhaps the legacy codebase isn't providing enough value, and the effort to maintain it would be better spent replacing it. But the implication is that the value outweighs the costs.

Put all the linters, fuzzing, and format checking in an automated build toolchain. Allow individuals to work how they want, except they can't break the build. Usually this will reign in the edge cases using inadequate tools. The "built on my machine, with my IDE, the way I like it and it works" is no longer the arbiter of correct, but neither does the organization have to deal with the endless yak shaving over brace style and tool choice.



> neither does the organization have to deal with the endless yak shaving over brace style and tool choice

I hear you, but an organization that fears this, instead of Just Pick Something And Deal With It, is an organization that probably doesn't have the right people in it to succeed at any task more arduous than that.



Conversely, and organization that imposes arbitrary choices and isn't capable of allowing people do use the tools they know best probably doesn't attract the best people. There are many different kinds of hammers, and making everyone who uses hammers use the same kind is, to say the least, counter productive.


I get where you're coming from, but frankly: nah. If you are so in-the-rut that you can't switch, say, text editors or IDEs and are compelled to have an Incredibly Normal Day about it, you're probably not actually possessed of the plasticity to be somebody I want to work with.

I use vim, IntelliJ, Visual Studio, and VSCode at least once every two weeks apiece, and it's no skin off my back to switch. Do thou likewise.



Software is arts and crafts :)


It should be less popsicle sticks and paste and more "Arts and Crafts Movement".

For those that dont know, Arts and Crafts movement in the states is known for some pretty interesting pottery that was produced at scale.

https://excellentjourney.net/2015/03/04/art-fear-the-ceramic...



> It's a nice idea, but it's hard to do. One person is using VIM, another is using emacs, another is using QTCreator, another primarily edits in VSCode.. Trying to get everyone on the same page about all this is very, very hard.

I must have missed the memo where I could just say no to basic things my boss requires of me. You know, the guy that pays my salary.



As others have mentioned, none of these things actually change your development workflow. But if they did, you do have the ability to say no. If your boss fails to understand that you have an environment that you're productive in, that sounds like a bad place to work.


Every company have their own workflow adapted to their tooling so that teams can work among themselves frictionless.

It's ok if you use your own tooling you are comfortable with, but you should adapt to their workflow, and the employer has no obligation to tweak their workflow to integrate your own, it's yours to adapt.



yikes


An optional step locally like pre-commit hooks should be backed up by a required step in the CI. In other words: the ability to run tests locally, lint, fuzz, format, verify Yaml format, check for missing EOF new lines, etc, should exist to help a developer prevent a CI failure before they push.

As far as linters causing thousands of warnings to appear on opening the project, the developer adding the linter should make sure that the linter returns no warnings before they merge that change. This can be accomplished by disabling the linter for some warnings, some files, making some fixes, or some combination of the above.



> It's a nice idea, but it's hard to do. One person is using VIM, another is using emacs, another is using QTCreator, another primarily edits in VSCode.. Trying to get everyone on the same page about all this is very, very hard.

Bullshit, all of these (and additionally C lion) are fairly easy to configure these for, with the possible exception of QTCreator (not a ton of experience on my end).

Just make it a CI requirement, and let everyone figure it out for their own tools. If you can't figure that out, you get to run it as a shell script before you do your PRs. If you can't figure that out, you probably shouldn't be on a legacy C++ project.



> One person is using VIM, ...

I don't get your point. You know you can autoformat outside editors right? Just configure pre-commit and run it in CI. It's trivial.

> If it's an optional step that requires that they install something new (like commit hook) it's just not going to happen.

It will because if they don't then their PRs will fail CI.

This is really basic stuff, but knowledge of how to do CI and infra right does seem to vary massively.



> It's dangerous for multiple reasons. Mainly it's a case of Chesterton's fence. Unless you fully understand why X was in the software...

If this is a function that no one links to, and your project does not mess with manual dynamic linking (or the function is not exposed), then it's pretty safe to remove it. If it's internal utility which does not get packaged into final release package, it is likely be safe to remove too. If it's a program which does not compile because it requires Solaris STREAMS and your targets are Linux + MacOS - kill it with fire.

(Of course removing function calls, or removing functionality that in-use code depends on, is dangerous. But there is plenty of stuff which has no connection to main code)



It's funny. My first step would be

  0. You reach out to the previous maintainers, visit them, buy them tea/beer and chat (eventually) about the codebase. Learned Wizards will teach you much.
But I didn't see that anywhere. I think the rest of the suggestions (like get it running across platform, get tests passing) are useful stress tests likely to lead you to robustness and understanding however.

But I'd def be going for that sweeet, sweet low-hangin' fruit of just talking to the ol' folks who came that way before. Haha :)



I wouldn't make it the first step. If you do, you will probably waste their time more than anything.

Try to work on it a little bit first, and once you get stuck in various places, now you can talk to the previous maintainers, it will be much more productive. They will also appreciate the effort.



Yeah, that's fair enough. I guess since we're already zero-indexed maybe my -1 step is Prep. Hahaha! :)


There’s a fine balance with no right or wrong answer. Previous maintainers will appreciate if you spent literally more than a second trying to understand before you reach out to them, but for your own sanity you should know when it’s time to stop and call for help.


I was once tasked with deploying a piece of software on a closed network (military), to run on a old custom OS - it wasn't a huge program, around 50k lines of code.

I did encounter a bunch of bugs and problems underway, and wanted to reach out to the devs that wrote it - as it was customer made for my employer.

Welp, turns out it was written by one guy/contractor, and that he had passed away a couple of years earlier.

At least in the defense industry you'll find these sort of things all the time. Lots of custom/one-off stuff, made for very specific systems. Especially on the hardware side it is not uncommon that the engineers that made the equipment are either long gone or retired.



Welp, you might have needed to consult a seance for that one.

Interesting point about the "author expiry" window. I was thinking about that today regarding something else:

Let's say in 20 years time, no database code has been updated for the last 20 years. And all the people who worked on it, can't remember anything about it. Yet, it still works.

That means that everyone who uses that database everyday, doesn't know how it works. They believe that it works, this belief is widespread. And the database providing results to queries, is a real thing. And it does work -- but nobody knows how.

This is common. I don't know in any detail how the MacBook I use works. But it does. I don't know how many things I use actually work. But they do work.

It seems the only difference, in the world of things that "work", and which most people who use them do not understand how they work, is that there are two classes of things: those things for which there is a widespread belief that they do work; and those things for which the belief that they work, is not widespread. But in either case, they work.



Even more common in the world of industrial automation.

Lots of old early-gen PLCs from the 70s/80s still ticking, with no documentation and the techs/engineers/companies long gone.

We worked on one such PLC, around 3 decades old at the point, and it came down to probing I/O, reverse engineering the functionality.

But at some point, if there hasn't been enough legacy support, there comes a time where people just have to bite the bullet and re-build a system from the ground up - and integrate it in parallel with the old system running, until it can be removed completely.

Too bad many of the old and forgotten systems are still running and integral, so they get put inside a glass cage with "DON'T TOUCH!" warning sticker.



> It's funny. My first step would be

> 0. You reach out to the previous maintainers, visit them, buy them tea/beer and chat (eventually) about the codebase. Learned Wizards will teach you much.

Have you ever tried that? This is legacy code. Even if the handover was yesterday, they cannot tell you about anything useful they did more than 6m in the past.

And that's the best-case scenario. The common-case is "That's the way I got it when it was handed over to me" answer to every question you ask.



I have, and assuming your predecessor doesn't mind, you still get enough useful answers to make it worth it a lot of the time.

Especially if you manage to get to just chatting about the project - at some point they'll almost certainly go "oh! while I remember," followed by the answer to a question you didn't realise you should've asked.

The value is often in the commiseration rather than the interrogation, basically.



Maybe do a quick look at codebase first so you can identify biggest WTF's and ask about them.

After all, if you have inherited a codebase with no tests, with build process which fails every other time, with unknown dependency info, and which can only be built on a single server with severely outdated OS... are you sure the previous maintainer is a real wizard and all the problems are result of not enough time? Or are they a "wizard" who keep things broken for job security and/or because they don't want to learn new things?



Even if that person is not a great archwizard, they still have more experience in that project than you and you will probably learn some things that will make your life less miserable, because you will better understand what to expect and what kind of failed solutions have been tried before.


Yes! Good idea. Locking in at -1. Hahah! :)


    > You reach out to the previous maintainers, visit them
I could have brought them flowers, and shared a moment of silence contemplating eternity. I don't know if it would significantly have helped understanding the code base though..


IME, this only works if you can get regular help from them. A one-off won't help much at all.


> A one-off won't help much at all.

Monumentally disagree. One-off session with a guy who knows the codebase inside out can save you days of research work. Plus telling you all about the problematic/historical areas.



I'm just stating my experience. A single day, if they still have access to the codebase might be able to clear up some top-level concepts.

But the devil is in all the tiny details. What is this tiny correction factor that was added 20 years ago? Why was this value cut off to X decimals? Why didn't they just do Y here? Why do we override this default behavior?

It's tens of thousands of tiny questions like that which you can't ask until you're there.



I don't understand what you're saying. Clearly both types of meetings (one-off vs recurring) would be helpful. The one-off may save you days/weeks of research, but it seems like you're not satisfied with that unless you can answer every single minor question you might have across the entire codebase.


I'm saying that if you're maintaining a code base for years, a single day's explanations won't do much of anything. It's a drop in the bucket.

It's not a bad thing, and it's certainly good to do, but it's not a solution to the problem.



If your granularity for a task is measured in years then you have a much different and harder problem.

Effectively everything becomes a "drop in the bucket".



Having a one day intro might save you two weeks of the 6 - 36 month task of getting to know the code base.

Like ... it doesn't help that much. Mainly it saves some frustration of the build system is insane or the source code is spread out in mails, a hd and a floppy in a drawer or the current source state is broken and need to be reverted.

But if the original author is there to do a handover, the chances are the company is properly run and the work will be a breeze anyway becouse the code is good and well structured.



We'll just have to accept that we live in different worlds. I still consider what you described a massive save.


Exactly. Someone to guide you in the right path. Gonna help a lot! Hahaha :)


Yeah you need to cultivate those relationships. But with a willing partner that first session will take you from 0 to 1 :)


A lot of the time it takes you from 0 to 0.1, but (a) every little helps (b) if you take them out for lunch or a post-work beer or whatever it can build a relationship where you can ask follow up questions via email.

Ideal is if they have a life such that something morally equivalent to "how about we meet up and I'll pay for the food/beer" is a viable thing to suggest later.

Oh, and always remember - the way to a geek's schedule is often through their wife. If you get a chance to meet their partner, TAKE IT and be on your best behaviour, if $partner likes you then you have massively increased chances of making useful things happen later.



Yeah, that's a good point. But I think more often it's closer to 1 because (I guess this depends on our definition of 0 and 1 haha! :)) a) you get the benefit of their conceptual models for thinking about the codebase, which is hard won and highly useful, and b) you grok any practical pitfalls or things to remember when running, building, testing, that may be simple enough, but in the space of all possibilities are hard to come by without knowing them.

So basically you get to go from: 0 - can do nothing at all; to 1 - being able to feel confident diving in and looking around, framed with the downloaded conceptual models, knowing how to hit the ground trotting, if not running.

To me, 0.1 is more like you slugged it out for a couple hours poking around by yourself, and now you know a handful of confusing things with low confidence.

Chatting to one who has been there, done that lets you have more confidence, and know more actionable stuff, which is highly useful. I suppose it depends on your conversation, listening and comprehension skills tho! Hahaha! :)

But seriously not everybody learns well in that situation. I just happen to. I'd much rather talk it over with someone and absorb, than watch a video or read a tutorial.

Finally -- haha! :) -- I guess there are some geeks to whose schedule the way is through their husband or whatevs haha! :)



I've always found discussing why former employees left a project incredibly enlightening. They will usually explain the reality behind the PR given they are no longer involved in the politics. Most importantly they will often tell you your best case future with a firm.

Normally, employment agreements specifically restrict contact with former staff, or discussions of sensitive matters like compensation packages.

C++ is like any other language, in that it will often take 3 times longer to understand something than re-implement the same. If you are lucky, than everything is a minimal API lib, and you get repetitive examples of the use cases... but the cooperative OSS breadcrumb model almost never happens in commercial shops...

Legacy code bases can be hell to work with, as you end up with the responsibility for 14 years of IT kludges. Also, the opinions from entrenched lamers on what productivity means will be painful at first.

Usually, with C++ it can become its own project specific language variant (STL or Boost may help wrangle the chaos).

You have my sympathy, but no checklist can help with naive design inertia. Have a wonderful day. =)



Uh, yeah, that's a great idea, too! That's the big question. What happened to the previous team? Can give you org insights as well as code ones. Info on company strategy, priorities, work cadence. Actually a pretty good open ended conversation opener! Hahaha! :)


Note that depending on your jurisdiction discussion of compensation may be a right protected by law.


Contract law has weird consequences in different places.

Indeed, if the legal encumbrance is not legal, than its often unenforceable.

Talking with your own lawyers before doing something silly is a good habit. ;-)



You mean the guys the company laid off last week?


Easier said than done

My step 0 would be: run it through an UML tool to get a class diagram and other diagrams.

This will help you a lot.

> Get the tests passing on your machine

Tests? On a C++ codebase? I like your optimism, rs



No one has time to maintain the UML in production.

You may luck out with auto-generated documentation, but few use these tools properly (javadoc or doxygen). =)



The GP said nothing about keeping and maintaining it, only generating it. Use it to understand the codebase, then archive it or throw it out.


Exactly. You inherited 500k SLOC of C++ that grew together since 1985. You don't know the interconnections between the classes that have accumulated in that time. It was also developed by multiple teams, and likely had very different approaches to OO during these past nearly 40 years. The UML diagrams won't tell you everything, but they will tell you things like the inheritance hierarchy (if this was a 1990s C++ project it's probably pretty nasty), what classes are referenced by others via member variables, etc. This can be hugely informative when you want to transform a program into something saner.


I always interpreted most polymorphism as sloppy context-specific state-machine embedding, and fundamentally an unmaintainable abomination from OOP paradigms.

OO requires a lot of planning to get right (again, no shop will give your team time to do this properly), and in practice it usually degenerates into spiral development rather quickly (

500k lines is not that bad if most of it is encapsulated libraries... =)



I’m not sure why there’s so much focus on refactoring or improving it. When a feature needs to be added that can just be tacked onto the code, do it without touching anything else.

If it’s a big enough change, export whatever you need out of the legacy code (by calling an external function/introducing a network layer/pulling the exact same code out into a library/other assorted ways of separating code) and do the rest in a fresh environment.

I wouldn’t try to do any major refactor unless several people are going to work on the code in the future and the code needs to have certain assumptions and standards so it is easy for the group to work together on it.



The post argues against major refactors. The incremental suggestions it gives progressively make the code easier to work with. What you suggest works until it doesn't -- something suddenly breaks when you make a change and there's so much disorganized stuff that you can't pinpoint the cause for much longer than necessary. The OP is basically arguing for decluttering in order to be able to do changes easier, while still maintaining cohesion and avoiding a major rewrite.


The right answer depends on the future. I've worked on C++ code where the replacement was already in the market but we had to do a couple more releases of the old code. Sometimes it is adding the same feature to both versions. There is a big difference in how you treat code that you know the last release is coming soon and code where you expect to maintain and add features for a few more decades.


Yes, you have to expect the future (or even better if your manager/boss already has expectations you can adopt to begin with) and then choose the right way to tackle the changes required. That's why I laid out 3 possible cases the last of which points out that I prefer to refactor primarily when there's a lot of work incoming on the codebase. Personally, I don't see much value in refactoring code significantly if you alone are going to be editing it because refactoring for ease of editing + the cost of editing in the refactored codebase is often less than just eating the higher cost of editing in the pre-refactored codebase and you don't reap the scaling benefits of refactoring as much. However, like I mentioned in the above paragraph, it depends. In the end it's all about managing the debt to get the most out of it in a _relatively_ fixed time period.


>"When a feature needs to be added that can just be tacked onto the code, do it without touching anything else."

In few lucky cases. In real life new feature is most likely change in behavior of already existing one and suddenly you have to do some heavy refactoring in numerous places.



if you're going to own it for the foreseeable future. then own it. learn it, refactor it, test the hell of out of it. otherwise you're never going to be able to debug or extend it.

one thing I always do is throwaway major refactors. its the fastest way for me to learn what the structure is, what depends on what, and what's really kinky. and I might just learn enough to do it for real should it become necessary.



> throwaway major refactors

Thank you for providing me a term for this! I indeed learned a lot from doing this because some things can only be understood by hitting them with a hammer, putting them together again and observing where that doesn't work.



absolutely. the best is when you spend all this time trying to figure out what this awful and convoluted thing is. and you finally just take it out to see what happens, and the answer is .. nothing

its software. we should take full of advantage of its plasticity.



This thread has lots of good advice. I'll add some of mine, not limited to C/C++. If you have luxury of using VCS, make a full use of its value. Many teams only use it as a tool merely for collaboration. VCS can be more than that. Pull the history then build a simple database. It doesn't have to be an RDB (it's helpful though); a simple JSON file or even a spreadsheet file is a good starter. There are so many valuable information to be fetched with just a simple data driven approach, almost immediately.

  * You can find out the most relevant files/functions for your upcoming works. If some functions/files have been frequently changed, then it's going to be the hot spot for your works. Focus on them to improve your quality of life. If you want to introduce unit tests? Then focus on the hot spot. Suffer from lots of merge conflicts? The same.
  * You can also figure out correlation among the project and its source files. Some seemingly distant files are frequently changed together? Those might suggest an implicit structure that not might be clear from the code itself. This kind of information from external contexts can be useful to understand the bird's eye view.
  * Real ownership models of each module can be inferred from the history. Having a clear ownership model helps, especially if you want to introduce some form of code review. If some code/data/module seems to have unclear ownership? That might be a signal for refactoring needs.
  * Specific to C/C++ contexts, build time improvements could be focused on important modules, in a data driven way. Incremental build time matters a lot. Break down frequently changed modules rather than blindly removing dependencies on random files. You can even combine this with header dependency to score the module with the real build time impact. 
There could be so many other things if you can integrate other development tools with VCS. In the era of LLM, I guess we can even try to feed the project history and metadata to the model and ask for some interesting insights, though I haven't tried this. It might need some dedicated model engineering if we want to do this without a huge context window but my guts tell that this should be something worth try.


Nice ideas! Do you have any tips for software to help automate some of those analyses?


> 3. Make the project enter the 21st century by adding CI, linters, fuzzing, auto-formatting, etc

I would break this down:

a) CI - Ensure not just you can build this, but it can be built elsewhere too. This should prevent compile-based regressions.

b) Compiler warnings and static analysers - They are likely both smarter than you. When it says "warning, you're doing weird things with a pointer and it scares me", it's a good indication you should go check it out.

c) Unit testing - Set up a series of tests for important parts of the code to ensure it performs precisely the task you expect it to, all the way down to the low level. There's a really good chance it doesn't, and you need to understand why. Fixing something could cause something else to blow up as it was written around this bugged code. You also end up with a series of regression tests for the most important code.

n) Auto-formatting - Not a priority. You should adopt the same style as the original maintainer.

> 5. If you can, contemplate rewrite some parts in a memory safe language

The last step of an inherited C++ codebase is to rewrite it in a memory safe language? A few reasons why this probably won't work:

1. Getting resources to do additional work on something that isn't broken can be difficult.

2. Rather than just needing knowledge in C++, you now also need knowledge in an additional language too.

3. Your testing potentially becomes more complex.

4. Your project likely won't lend itself to being written in multiple languages, due to memory/performance constraints. It must be a significantly hard problem that you didn't just write it yourself.

5. You have chosen to inherit a legacy codebase rather than write something from scratch. It's an admittance that you don't have some resource (time/money/knowledge/etc) to do so.



> The last step of an inherited C++ codebase is to rewrite it in a memory safe language

Simply getting rid of any actually memory unsafe C++ and enforcing guidelines will do this for you in the C++ codebase.

"Rewrite it in X" only adds complexity because it's the flavour of the month as you said in your comment.

Author is already doing the work of rewriting large chunks of the codebase in C++, they may as well follow and implement a more restrictive subset of the language, I find High integrity C++ to be good. If I can get my hands on the latest MISRA standard that is likely good as well. These may not be "required" but they specify what is enforced in . So instead of having to reskill your entire devteam on a new language which has many many sharp edges, how about just having your dev team use the language they already know and enforce guidelines to avoid known footguns.



How would you get rid of any memory unsafe C++? Isn’t that just another way of saying “do not make mistakes”?


The same way you do it in rust, use wrappers for all memory allocation.

C++ has had RAII forever, since C++11(2024 btw now) have actually good wrappers, Box = std::unique_ptr, whatever the ref counter version is = std::shared_pte.

Do the other things he already said to do, ie clang-tidy with the correct rule will warn/error on any raw pointer usage. You don't need to "not make mistakes". If you never use raw pointers, other than these specific places you tell the linter that it was fine and you have checked then by default it will be memory safe. Does that sound familiar?

"But we don't need a linter in Rust!" It's just built into the LLVM frontend called the rust compiler. If it bugs you so much build a custom executable that runs the linter then the c++ compiler and call it your own internal compiler. First line of code can be "#!/bin/sh"...

If you say you aren't talking about Rust but one of the GC languages, then I agree with you, write it in that other language but then the correct solution is to rewrite the software in the first place since it was never written in the correct language to start with. Rewriting something in Rust is likely the same amount of complexity as fixing the C++ code, if you however first need to learn Rust well there's a reason I am not exactly pro rust ans yes I bave tried to use it in a proper complex project, all deadlines were missed and we ended up writing it in modern safe C++ instead purely because the language didn't force us to make up some obscure abstraction to appease the borrow checker.



> So what do I recommend? Well, the good old git submodules and compiling from source approach.=

It is strange that the author complains so much about automating BOMs, package versioning, dependency sources, etc, and then proceeds to suggest git submodules as superior to package managers.

The author needs to try vcpkg before making these criticisms; almost all of these are straightforwardly satisfied with vcpkg, barring a few sharp edges (updating dependencies is a little harder than with git submodules, but that's IMO a feature and not a bug—dependencies are built in individual sandboxes which are then installed to a specified directory. vcpkg can set internal repositories as the registry instead of the official one, thus maintaining the 'vendored in' aspect. vcpkg can chainload toolchains to compile everything with a fixed set of flags, and allows users to specify per-port customisations.

These are useful abstractions and it's why package managers are so popular, rather than having everyone deal with veritable bedsheets' worth of strings containing compile flags, macros, warnings, etc.



I enjoyed the article and learned something. But I've been wondering: When people say "rewrite in a memory-safe language", what languages are they suggesting? Is this author rewriting parts in Go, Java, C#? Or is it just a smirky, plausibly deniable way of saying to rewrite it in Rust?


Author here, thanks! A second article will cover this, but the bottom line is that it entirely depends on the team and the constraints e.g. is a GC an option (then Go is a good option), is security the highest priority, etc.

I’d say that most C++ developers will generally have an easy time using Rust and will get equivalent performance.

But sometimes the project did not have a good reason to be in C++ in the first place and I’ve seen successful rewrites in Java for example.

Apple is rewriting some C++ code in Swift, etc. So, the language the team/company is comfortable with is a good rule of thumb.



Makes sense, thanks!


So you saw a post about C++, it didn’t mention “Rust” once, mentioned “memory safe” languages which there are dozens of, and yet found a way to shoehorn in a dismissive comment about a meme. Nice.

We’ve reached the rewrite-in-rust meme stage of questioning whether the author is a nefarious crypto-Rust programmer in lieu of not being able to complain about it (since it wasn’t brought up!).



(author actually shows up to advocate for rust)


Author actually replies that Go, Java, Rust, Swift are options depending on the context, and that general considerations like security are relevant as well. But whatever helps you grind your axe!


The article doesn't mention anything about global variables, but reducing/eliminating them would be a high priority for me.

The approach I've taken is, when you do work on a function and find that it uses a global variable, try to add the GV as a function parameter (and update the calling sites). Even if it's just a pointer to the global variable, you now have another function that is more easily testable. Eventually you can get to the point where the GV can be trivially changed to a local variable somewhere appropriate.



I've had to do this several times in the past. Honestly, my best advice would probably be make several backups, then to do as little as possible. If you need to make a small change, fine. Bigger changes? Consider if you can't do the bulk of the work in a technology or stack you understand and only make a small change to the legacy code base.

Most of the time I spend with C++ code revolves around figuring out compile/link errors. Heaven forbid you need to deal with non-portable `make` files that for some reason work on the old box, but not yours... Oh, and I hope you have a ton of spare space because of some reason building a 500k exe takes 4GB.

Keep in mind, this advice only applies to inherited C++ code bases. If you've written your own or are working on an actively maintained project these are non-issues. Sort-of.



Been there, done that. Don't be a code beauty queen. Make it compile and make it run on your machine. Study the basic control-flow graph starting from the entry point and see the relations between source files. Debug it with step-into and see how deep you go. Only then can you gradually start seeing the big picture and any potential improvements.


In my experience it takes at least a year straight working with code before you can form an opinion on if it is beautiful or not. People who have not worked in a code base for that long do not understand what is a beautiful design corrupted by the real world vs what is ugly code. Most code started out with a beautiful design but the real world forced ugly on it - you might be able to improve this a little with full rewrite but the real world will still force a lot of ugly on you. However some code really is bad.


Absolutely. Read the code. Step through with a debugger. Fix obvious bugs. If it’s legacy and somebody is still paying to have it worked on, it must mostly work. Changing things for “cleanliness and modernization” is likely to break it.


> Fix obvious bugs.

Be careful about that. Hyrum's Law and all.



Should have been clearer. You’ve probably been put on the project because something isn’t working. Fix the simplest, most obvious of these. Fixing a bug is a good way to learn.


> You’ve probably been put on the project because something isn’t working.

Perhaps if it's a change requested by the organization or the users. Just don't go "fixing" things that look like bugs without knowing if it's really a bug or expected behavior.



> Get out the chainsaw and rip out everything that’s not absolutely required to provide the features your company/open source project is advertising and selling

Except every legacy C++ codebase I've worked on is decades old. Just enumerating the different "features" is a fool's errand. Because of reshuffling and process changes, even marketing doesn't have a complete list of our "features". And even it there was a complete list of features, we have too many customers that rely on spacebar heating[0] to just remove code that we think doesn't map to a feature.

That's if we can even tease apart which bits of code map to a feature. It's not like we only added brand new code for each feature. We also relied on and modified existing code. The only code that's "safe" to remove is dead code, and sometimes that's not as dead as you might think.

Even if we had a list of features and even if code mapped cleanly to features, the idea of removing all code not related to "features your company is advertising or selling" is absurd. Sometimes a feature is so widely used that you don't advertise it anymore. It's just there. Should Microsoft remove boldface text from Word because they're not actively advertising it?

The only way this makes sense is if the author and I have wildly different ideas about what "legacy" means.

[0] https://xkcd.com/1172/



> > Get out the chainsaw and rip out everything that’s not absolutely required to provide the features your company/open source project is advertising and selling

> Except every legacy C++ codebase I've worked on is decades old. Just enumerating the different "features" is a fool's errand. Because of reshuffling and process changes, even marketing doesn't have a complete list of our "features".

Yeah, this struck me also, and your post should be modded up more. Anyone with significant experience in development knows what "Legacy" means.

Regardless of the language, after a specific point in a product's lifetime you cannot "know" all the features. Just not possible, no matter how well you think you documented it.

In an old product, every single line of code is there because of a reason that is not in the docs. Some examples that I've seen:

1. Using `int8_t` because at some point we integrated a precompiled library that was compiled with signed char, and we want warnings to pop up when we mix signs.

2. Wrote our own stripped-down SSL library because OpenSSL was not EMV certified at the time and did not come with the devkit. Now callers depend on a feature in our own library.

3. Client calls our DLL with Windows-specific UTF-16 strings. That's why that function has three variants that take 3 different types of strings.

4. This library can't be compiled with any gcc/glibc newer than X.Y, because the compiled library is loaded with `dlopen` in some environments.

5. We have our own 'safe' versions of string functions, because MSVC which takes the same parameter types for those functions assigns different meanings to the `size` parameter.

6. Converting fixed-precision floats to an int, performing the additions, and then converting the last division is faster and more accurate, but the test suite at the client expects the cumulative floating point errors and the test will fail.

Not to mention uncountable "marketing said this is not offered as a feature, but client depends on it" things.



hard agree

removing a feature is possibly the most politically intractable thing you can try to do with a legacy codebase, almost never worth trying



This is generally the same path that LibreOffice followed. Works reasonably well.

We built our own find-dead-code tool, because the extant ones were imprecise, and boy oh boy did they find lots of dead stuff. And more dead stuff. And more dead stuff. Like peeling an onion, it went on for quite a while. But totally worth it in the end, made various improvements much easier.



Besides what everyone else told you make sure you are making at least 250k/y


Rewriting is questionable. Joel Spolsky has a famous blog post about this from two decades ago that's still relevant today [1].

[1] https://www.joelonsoftware.com/2000/04/06/things-you-should-...



A good read. We recently did "Rewrite in a memory safe language?" successfully. It was something that shouldn't have been written in C++ in the first place (it was never performance sensitive).


Probably not a project spanning more than 3 decades of development and millions of lines of code?


Do you have a public write-up (blog post)? If yes, you should post it on HN. It would probably generate lots of interesting conversation.


Would you mind sharing what language you used?


If you have a codebase with lots and lots of tests, you are not in a bad place. Remember legacy means a codebase that works and solved and still solves problems over decades. In a sense,a successfull software project implies it will be marked as legacy. Always prefer legacy over Hype.


A software project being successful doesn’t make the experience of working on it any better. I’d prefer hype if that means I get to avoid suffering with a three decade old C++ codebase, even if it’s not as successful.


This is pretty great advice for any legacy code project. Even outside of C++ there is a huge amount of code bases out there that do not compile/run on a dev machine without tons of work. I once worked on a Java project that due to some weird dependencies, the dev mode was to run a junit test which started spring and went into an infinite loop. Getting a standard run to work helped a ton.


The difference between greenfield and legacy code is just a few years. So learn to work with legacy code and how to make it better over time.


Is it worth getting more into C++ in 2024? Lots of interesting jobs in finance require it but it seems almost impossible to get hired without prior experience (with C++ and in finance).


Yes.

I switched from Python to C++ because Cython, Numba, etc. just weren't cutting it for my CPU-intensive research needs (program synthesis), and I've never looked back.



My question isn't whether it's a good fit for a specific project, I'm more interested in whether it's a good career choice e.g. can you get a job using C++ without C++ experience; how realistic is it to ramp up on it quickly; whether you're likely to end up with some gnarly legacy codebase as described in the OP; is it worth pursuing this direction at all.


Modern C++ is the language of choice for high-performance, high-scale data-intensive applications and will remain so for the foreseeable future. This is a class of application for which it is uniquely suited (C and Rust both have significant limitations in this domain). There are other domains like gaming that are also heavily into C++. Avoiding legacy C++ codebases is more about choosing where you work carefully.

It goes without saying that if you don't like the kinds of applications where C++ excels then it may not be a good career choice because it is not a general purpose language in practice.



> Modern C++ is the language of choice for high-performance, high-scale data-intensive applications

> C and Rust both have significant limitations in this domain

Rust? Please provide concrete examples. I don't believe it.



C++ is really a language that you want to specialize in and cultivate years of deep expertise with, rather than having it as one tool in your belt like you can with other languages.

That's certainly a choice you can make, and modern C++ is generally a pretty good experience to work with. I would hope that there's not a ton of active C++ projects which are still mostly using the pre-2011 standard, but who knows.



This exactly. It’s a blessing and a curse, because I’d love to move to a “better” language like Rust or even Zig. But with 20+ years of C++ experience I feel like I’d be throwing away too much to avoid C++ completely. Also agreed that modern C++ is pretty decent. Lamenting that I’m back in a codebase that started before C++11 vs my previous job that was greenfield C++14/17.


I would be very surprised if most people actually choose to develop in C++. It's a very good language choice for many domains, and I suspect interest and expertise in those domains drives people to C++ more than a desire to program in C++.


Did you see yesterday's article about the White House Office of the National Cyber Director (ONCD) advising developers to dump C, C++, and other languages with memory-safety issues?


I still think knowing C++ is pretty valuable to someone's career (at least over the next 10 - 15 years) if they're looking to work in fields that traditionally use C++ but might be transitioning away from it.

The obvious comparison is Rust. There are way more C++ jobs out there than Rust jobs. And even if I'm hiring for a team developing something in Rust, I'd generally prefer candidates with similar C++ experience and a basic understanding of Rust over candidates with a strong knowledge of Rust and no domain experience. Modern C++ and Rust aren't _that_ dissimilar, and a lot of ideas and techniques carry over from C++ to Rust.

Even if the DoD recommends that contractors stop using C++ and tech / finance are moving away from it, I'd say we're still years away from the point where Rust catches up to C++ in terms of job opportunities. If your main goal is employment in a certain industry, you'll probably have an easier time getting your foot in the door with C++ than Rust. Both paths are viable but the Rust path would be much harder IMO.



Yes, and at the same time I’m seeing ads for jobs that pay more than double what I make that require C++.


We’re still in for another 20 years of hardcore veteran Cxx programmers insisting that either the memory safety issue is overblown or just a theoretical issue if you are experienced enough/use a new enough edition of the language.


The C++ committee is looking hard at how to make C++ memory safe. If you use modern C++ you are already reasonably memory safe - the trick is how do we force developers to not access raw memory (no new/malloc, use vector not arrays...). There are some things that seem like they will come soon.

Of course if you really need that non-memory safe stuff - which all your existing code does - then you can't take advantage of it. However you can migrate your C++ to modern C++ and add those features to your code. This is probably easier than migrating to something like Rust (Rust cannot work with C++ unless you stick with the C subset from what I can tell) since you can work in small chunks at a time in at least some situations.



IME, c++ was easier to ramp up on than typescript. C++ still a lingua franca in many domains, e.g., robotics, games, finance.


Finance? No, most of it was rewritten in the 2000s to Java or DotNet. Sure, a bunch of HNers will reply here that they work on high frequency market making systems that use C++, but they are an extreme minority in the industry at this point.


Depends on the industry you are interested in entering.

My myopic view of the world has seen the general trend from C to C++ for realtime embedded applications. For example: in the Automotive Industry all the interesting automotive features are written in C++.



Great, now they can all inherit from “HopeIDontCrashToday”


This is excellent advice, especially the list of what not to do. I don’t think it’s just C++, it’s just C++, it’s working with any legacy code base. You gotta approach it on its own terms, and analyze and fully understand what’s happening before you start changing things.

I observed from afar when the Gwydion Dylan folks (the Dylan successor to the CMU CL compiler) inherited Harlequin’s Dylan compiler and IDE and decided to switch to that going forward: https://opendylan.org. The work (done out in the open in public mailing lists and IRC) is a very nicely done case study in taking a large existing code base developed by someone else, studying it, and refactoring it to bring it incrementally into the present. They started with retooling the build system and documenting the internals. Then over time they addressed major pain points, like creating an LLVM backend to avoid the need to maintain custom code generators.



I just went this very same dance with an old project, smart, which evaluates string matching algorithms. Faster strstr(). From 2013. It was in a better shape than zlib, but still.

Their shell build script was called makefile, kid me not. So first create a proper dependency management: GNUmakefile. A BSD makefile would have been prettier, but not many are used to this. dos2unix, chmod -x `find . -name *.c -o name *\.h`, clang-format -i All in seperate commits.

Turns out there was a .h file not a header, but some custom list of algorithm states, broken by fmt. Dontg do that. Either keep it a header file, or rename it to .lst or such.

Fix all the warnings, hundreds. Check with sanitizers. Check the tests, disable broken algorithms, and mark them as such.

Improve the codebase. There are lots of hints of thought about features. write them. Simplify the state handling. Improve the tests.

Add make check lint. Check all the linter warnings.

Add a CI. Starting with Linux, Windows mingw, macos and aarch64. Turns out the code is Linux x64 only, ha. Make it compat with sse checks, windows quirks.

Waiting for GH actions suck, write Dockerfiles and qemu drivers into your makefile. Maybe automake would have been a better idea after all. Or even proper autoconf.

Find the missing algorithms described elsewhere. Add them. Check their limitations.

Reproducible builds? Not for this one, sorry. This is luxury. Rather check clang-tidy, and add fuzzing.

https://github.com/rurban/smart



This article (and admittedly most comments here) doesn't emphasize the value of a comprehensive e2e test suite enough.

So much talk about change and large LoC deltas without capturing the expected behavior of the system first



Despite being framed as something for legacy C/C++ codebases, this is pretty good advice for setting up testing and CI automation around any project.

I recently started on a new Rust project, and despite not having to worry about things like sanitizers as much, I followed a similar approach of getting it to compile locally, getting it compile in a docker container, setup automated CI/CD against all PRs.

Although I would order the steps as 1, 3, 4, 2. Don't get out the chainsaw until you have CI/CD tests evaluating your code changes as you go.



I think the very best thing one can do is reduce the amount of variation you have to support. The burden of change is thus vastly reduced and the number of possible avenues for improvement explodes.

We could have left customers with old operating systems on the older versions of the product. A lot of them never upgraded anyhow. We absolutely destroyed our productivity by not making this kind of decision. We also really hurt ourselves by supporting Windows - as soon as there are 2 or more completely different compilers things turn to **t. I'm not even sure we made much money from it.

Given the ability to use new tools (clang, gcc and others) that are only available on newer operating systems we could have done amazing things. All those address sanitizers etc would have been wonderful and I would like to have done some automated refactoring which I know clang has tools for.

Most of the problems were just with understanding the minds of the developers - they were doing something difficult and at a level of complexity that somewhat overmatched the problem most of the time but the complexity was there to handle the edge cases. I wanted to go around adding comments to the files and classes as I understood bits of it. I was working with one of the original developers who was of course not at all interested in anyone understanding it or making it clearer and this kind of effort tended to get shot down.

If you don't have good tests you're dead in the water. I have twice inherited python projects without tests at all and those were a complete nightmare until I added some. One was a long running build process in which unit tests were only partially helpful. Until I came up with a fake android source tree that could build in under a minute I was extremely handicapped. Once I had that everything started to get much better.

My favorite game ... is an open source C++ thing called warzone2100 - no tests. It's not easy to make changes with confidence. I imagine to myself that one day my contribution will be to add some. The problem is that I cannot imagine the current developers taking all that kindly to it. Some people get to competence in a codebase and leave it at that.



I think the approach suggested in 'Working effectively with Legacy Code' (https://www.oreilly.com/library/view/working-effectively-wit...) is the right one. It's all about testing, and the confidence to make changes.

So buy the book, read it, apply the ideas.





I like to use cppdepend to navigate a large and unfamiliar codebase https://www.cppdepend.com. The interactive dependency graph and integration to the editor to jump back and forth in diagrams to actual code and the many logical constructs in the source certainly accelerates getting a quick sense of the layering and a bit of the architecture of the project.


Read it. A little every day until you've passed your eyes over all of it.

Make notes about mysterious things. Check them off once you've figured them out.

Try to find the 'business case' database. You will fail, nobody has one. Make one then, while you're reading the code.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact



Search:
联系我们 contact @ memedata.com