"dut" displays a tree of the biggest things under your current directory, and it also shows the size of hard-links under each directory as well. The hard-link tallying was inspired by ncdu[3], but I don't like how unintuitive the readout is. Anyone have ideas for a better format?
There's installation instructions in the README. dut is a single source file, so you only need to download it and copy-paste the compiler command, and then copy somewhere on your path like /usr/local/bin.
I went through a few different approaches writing it, and you can see most of them in the git history. At the core of the program is a datastructure that holds the directories that still need to be traversed, and binary heaps to hold statted files and directories. I had started off using C++ std::queues with mutexes, but the performance was awful, so I took it as a learning opportunity and wrote all the datastructures from scratch. That was the hardest part of the program to get right.
These are the other techniques I used to improve performance:
* Using fstatat(2) with the parent directory's fd instead of lstat(2) with an absolute path. (10-15% performance increase)
* Using statx(2) instead of fstatat. (perf showed fstatat running statx code in the kernel). (10% performance increase)
* Using getdents(2) to get directory contents instead of opendir/readdir/closedir. (also around 10%)
* Limiting inter-thread communication. I originally had fs-traversal results accumulated in a shared binary heap, but giving each thread a binary-heap and then merging them all at the end was faster.
I couldn't find any information online about fstatat and statx being significantly faster than plain old stat, so maybe this info will help someone in the future.
[1]: https://github.com/KSXGitHub/parallel-disk-usage
[2]: https://github.com/bootandy/dust
[3]: https://dev.yorhel.nl/doc/ncdu2, see "Shared Links"