(评论)
(comments)

原始链接: https://news.ycombinator.com/item?id=43421815

Hacker News 的讨论探讨了为什么有些语言使用一个完整的字节(8 位)来表示布尔类型,而不是单个位。虽然只使用一位更节省空间,但由于 CPU 访问内存的方式,它通常速度较慢。内存是以较大的块(至少一个字节)读取的,因此访问单个位需要额外的指令来隔离和修改它,这涉及读-修改-写操作。 虽然结构体中的位域允许将布尔值打包到单个位中,但它们不可移植,并且不能直接指向。此外,加载单个位会拉取整个缓存行,因此使用 sum 类型可能更好地利用内存。最终,选择取决于内存使用量、速度和目标平台之间的权衡,在现代内存丰富的系统中,更简单、更快的基于字节的表示通常更受欢迎。原子性也是一个需要考虑的因素。

Hacker News 的讨论探讨了为什么有些语言使用一个完整的字节(8 位)来表示布尔类型,而不是单个位。虽然只使用一位更节省空间,但由于 CPU 访问内存的方式,它通常速度较慢。内存是以较大的块(至少一个字节)读取的,因此访问单个位需要额外的指令来隔离和修改它,这涉及读-修改-写操作。 虽然结构体中的位域允许将布尔值打包到单个位中,但它们不可移植,并且不能直接指向。此外,加载单个位会拉取整个缓存行,因此使用 sum 类型可能更好地利用内存。最终,选择取决于内存使用量、速度和目标平台之间的权衡,在现代内存丰富的系统中,更简单、更快的基于字节的表示通常更受欢迎。原子性也是一个需要考虑的因素。
相关文章
  • (评论) 2024-02-29
  • 不要在 AMD64 上传递大于 16 字节的结构 2024-01-06
  • (评论) 2025-03-17
  • (评论) 2023-11-30
  • (评论) 2024-01-17

  • 原文
    Hacker News new | past | comments | ask | show | jobs | submit login
    Ask HN: Why some languages use 1 byte for boolean type
    5 points by Genius_um 55 minutes ago | hide | past | favorite | 14 comments
    Some programming languages like D use 8 bits for their boolean type, why they don't use 1 bit ?










    C and C++ also use 8 bits in most cases. One reason is to support pointers, ie `bool *`.

    You can get a single-bit bools in a C++ struct, with eg

      struct foo {
        bool a:1;
        bool b:1;
        ...
      }
    
    but you can't take a pointer to such a member.


    The old PDP-10 had 36-bit words but you could write pointers to any range of bits inside a word. On that machine you could write a pointer to a single bit.

    I was thinking about a fantasy computer that could run under JavaScript and have 48 bit words (fit in a double) and pointers like the PDP-10 but that could hang off one word into the next word. [1] First I was thinking “screw C” but came to the conclusion that supporting C would not be so hard after all with that kind of pointer.

    [1] Think of a Chinese home computer from an alternate timeline where a strange mainframe had a baby with a NeoGeo arcade machine.



    Just for completeness' sake: bit fields are a feature in plain old C too, and the syntax is the same [1].

    Their most "obnoxious" feature is that the layout is implementation defined, so they are not very portable between compilers and/or architectures.

    Often used in an embedded setting to model hardware registers, when you can know/control what compiler implementation is used.

    [1]: https://en.wikipedia.org/wiki/Bit_field#C_programming_langua...

    Edit: more with the words.



    Isn't struct layout in C implementation defined in general?

    C itself doesn't specify any ABI. A given platform simply uses one as a matter of convention.



    See also why the specialisation implementation of vector (which uses a packed internal representation) is widely considered a mistake.


    I think the better question is why use a single bit or byte for a single bool when sum types exist?

    You're pulling in an entire cache line (64 bytes) with any load. Why would you not just turn the bool into sum type that carries the payload with them? That way you can actually use the rest of the cache line instead of loading 64 bytes to work with a single bit, throwing the other 511 bits in the trash, and then doing another load on top for the data.

    It's even worse when you do multithreading with packed bools because threads can keep trashing each other's cache lines forcing them to wait for the load from L3, or worse, DRAM.



    Because engineering is all about making choices. Making a boolean 1 bit would be space efficient. However, memory is being read at least 1 byte at a time. If you want 1 bit of that byte, that's an extra instruction.

    So storing a boolean in one byte is more speed efficient!

    (In C you can store a boolean in one bit. If, for example, you need to store a great number of booleans and memory size is more important than speed. )



    It would be slower to get or set one bit in a byte than to treat the whole byte as a bit. Pipelining could hide the read overhead but memory operations are slow and setting one bit would be mean reading the byte, doing an AND or OR and then writing the byte back.


    The smallest load/store the original Alpha ISA supported was 32 bits. :-)


    Likewise with the Texas Instruments C40 DSP. sizeof(char)==sizeof(int) in that case, both representing 32-bit values.


    As others have said: The way CPUs fetch and access memory is in larger word sizes. Even 1 8-bit byte can be inefficient depending on how that byte aligns in the larger structure and which ISA you're talking about.

    Memory is cheap, so. shrug

    In embedded this can be a very different story though. There we are often working with tiny memories and lower clock speeds and the concern is packing everything in tighter.



    As hinted by another commenter this is because a byte is usually the smallest unit addressable in memory so using a whole byte even if the type does not require it simplifies everything.


    there are also atomicity issues


    The minimum access in modern computers is 1 byte of 8 bits, which means that to losslessly change the value of an 1-bit bool you need three instructions:

    1. load byte at $address into $register

    2. use whatever native instructions there are to change just that single bit in $register - in the worst case you need multiple of them

    3. write byte from $register into $address

    In contrast, all modern platforms have a single "store immediate (=hardcoded in the bytecode) value" instruction, so it's either two (set register to 0 / 1, write register to RAM) or even one (write value directly to RAM) instruction.

    Bit-packing structures (another poster showed an example here in the thread) used to be done pretty much everywhere and you'll notice it if you deal with reverse engineering code even from the Windows 98 era... but in anything more modern it's not needed because computers nowadays have more than 640 KB of RAM [1].

    By the way, accessing / modifying small values in large(r) containers is a common problem in computing in general, so if it interests you, you might want to look into "SSD write amplification", "SMR HDD write amplification" or "flash wear leveling".

    [1] https://lunduke.locals.com/post/5488507/myth-bill-gates-said...







    Join us for AI Startup School this June 16-17 in San Francisco!


    Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact



    Search:
    联系我们 contact @ memedata.com