Ask HN: Why some languages use 1 byte for boolean type

tlb · 2025-03-20T11:43:29 1742471009

C and C++ also use 8 bits in most cases. One reason is to support pointers, ie `bool *`.

You can get a single-bit bools in a C++ struct, with eg

  struct foo {
    bool a:1;
    bool b:1;
    ...
  }

but you can't take a pointer to such a member.

PaulHoule · 2025-03-20T12:08:52 1742472532

The old PDP-10 had 36-bit words but you could write pointers to any range of bits inside a word. On that machine you could write a pointer to a single bit.

I was thinking about a fantasy computer that could run under JavaScript and have 48 bit words (fit in a double) and pointers like the PDP-10 but that could hang off one word into the next word. [1] First I was thinking “screw C” but came to the conclusion that supporting C would not be so hard after all with that kind of pointer.

[1] Think of a Chinese home computer from an alternate timeline where a strange mainframe had a baby with a NeoGeo arcade machine.

unwind · 2025-03-20T12:19:24 1742473164

Just for completeness' sake: bit fields are a feature in plain old C too, and the syntax is the same [1].

Their most "obnoxious" feature is that the layout is implementation defined, so they are not very portable between compilers and/or architectures.

Often used in an embedded setting to model hardware registers, when you can know/control what compiler implementation is used.

[1]: https://en.wikipedia.org/wiki/Bit_field#C_programming_langua...

Edit: more with the words.

mmaniac · 2025-03-20T12:33:34 1742474014

Isn't struct layout in C implementation defined in general?

C itself doesn't specify any ABI. A given platform simply uses one as a matter of convention.

rcxdude · 2025-03-20T12:21:05 1742473265

See also why the specialisation implementation of vector (which uses a packed internal representation) is widely considered a mistake.

Veliladon · 2025-03-20T12:19:40 1742473180

I think the better question is why use a single bit or byte for a single bool when sum types exist?

You're pulling in an entire cache line (64 bytes) with any load. Why would you not just turn the bool into sum type that carries the payload with them? That way you can actually use the rest of the cache line instead of loading 64 bytes to work with a single bit, throwing the other 511 bits in the trash, and then doing another load on top for the data.

It's even worse when you do multithreading with packed bools because threads can keep trashing each other's cache lines forcing them to wait for the load from L3, or worse, DRAM.

fatuna · 2025-03-20T12:15:15 1742472915

Because engineering is all about making choices. Making a boolean 1 bit would be space efficient. However, memory is being read at least 1 byte at a time. If you want 1 bit of that byte, that's an extra instruction.

So storing a boolean in one byte is more speed efficient!

(In C you can store a boolean in one bit. If, for example, you need to store a great number of booleans and memory size is more important than speed. )

PaulHoule · 2025-03-20T11:58:31 1742471911

It would be slower to get or set one bit in a byte than to treat the whole byte as a bit. Pipelining could hide the read overhead but memory operations are slow and setting one bit would be mean reading the byte, doing an AND or OR and then writing the byte back.

ahoka · 2025-03-20T12:08:13 1742472493

The smallest load/store the original Alpha ISA supported was 32 bits. :-)

jnwatson · 2025-03-20T12:14:30 1742472870

Likewise with the Texas Instruments C40 DSP. sizeof(char)==sizeof(int) in that case, both representing 32-bit values.

cmrdporcupine · 2025-03-20T12:12:34 1742472754

As others have said: The way CPUs fetch and access memory is in larger word sizes. Even 1 8-bit byte can be inefficient depending on how that byte aligns in the larger structure and which ISA you're talking about.

Memory is cheap, so. shrug

In embedded this can be a very different story though. There we are often working with tiny memories and lower clock speeds and the concern is packing everything in tighter.

mytailorisrich · 2025-03-20T12:06:35 1742472395

As hinted by another commenter this is because a byte is usually the smallest unit addressable in memory so using a whole byte even if the type does not require it simplifies everything.

muth02446 · 2025-03-20T12:13:36 1742472816

there are also atomicity issues

mschuster91 · 2025-03-20T12:16:58 1742473018

The minimum access in modern computers is 1 byte of 8 bits, which means that to losslessly change the value of an 1-bit bool you need three instructions:

1. load byte at $address into $register

2. use whatever native instructions there are to change just that single bit in $register - in the worst case you need multiple of them

3. write byte from $register into $address

In contrast, all modern platforms have a single "store immediate (=hardcoded in the bytecode) value" instruction, so it's either two (set register to 0 / 1, write register to RAM) or even one (write value directly to RAM) instruction.

Bit-packing structures (another poster showed an example here in the thread) used to be done pretty much everywhere and you'll notice it if you deal with reverse engineering code even from the Windows 98 era... but in anything more modern it's not needed because computers nowadays have more than 640 KB of RAM [1].

By the way, accessing / modifying small values in large(r) containers is a common problem in computing in general, so if it interests you, you might want to look into "SSD write amplification", "SMR HDD write amplification" or "flash wear leveling".

[1] https://lunduke.locals.com/post/5488507/myth-bill-gates-said...

(评论) (comments)

(评论)
(comments)