十进制浮点数
DEC64: Decimal Floating Point (2020)

原始链接: https://www.crockford.com/dec64.html

## DEC64:一种新的数字类型 DEC64是一种64位数字类型,旨在为金融和科学应用提供通用解决方案,可能取代单独的整数和浮点类型。它精确表示最多16位小数的十进制分数,处理从1.0E-127到3.6028797018963967E+143的值。 在内部,DEC64使用一个56位系数和一个8位指数。它在整数值上提供快速性能,并简化了与十进制字符串的转换,避免了二进制浮点系统固有的不准确性。值得注意的是,它具有255种零的表示形式和一个“非数字”(NaN)值。 在硬件和软件中都可以实现高效的实现,当指数相等时,可以实现快速的加法路径。参考实现可在GitHub上获得。该设计借鉴了EDSAC和Burroughs 5000等历史系统,旨在通过利用基于十进制的方法而不是传统的二进制浮点模型来实现简单性和准确性。

## Hacker News 讨论:DEC64 浮点数 最近 Hacker News 的讨论重新审视了 Douglas Crockford 的 DEC64 十进制浮点格式(2020 年),引发了关于浮点标准及其复杂性的争论。用户指出,关于该主题的先前讨论可以追溯到 2014 年。 讨论的核心在于,像 DEC64 这样简化、单一的数值类型是否可能是有益的,尤其是在类型歧义导致错误的语言中。虽然 DEC64 旨在实现 16 位十进制数字的精度,但人们担心它缺乏规范化——使比较困难——以及潜在的性能问题。 许多评论者为 IEEE 754 标准辩护,承认其复杂性,但强调了其广泛的应用和成功的权衡。一些人指出,IEEE 754 往往没有被完全正确地实现,并且金融计算经常*会*使用带有误差缓解策略的浮点数。另一些人强调了为任务选择正确的数值类型的重要性,对于某些应用而言,固定点或任意精度可能更可取。 最终,这场讨论凸显了在有限系统中表示实数的固有挑战以及不同计算领域的多样化需求。
相关文章

原文
DEC64

Overview

DEC64 is a number type. It can precisely represent decimal fractions with 16 decimal places, which makes it very well suited to all applications that are concerned with money. It can represent values as gargantuan as 3.6028797018963967E+143 or as measly as 1.0E-127, which makes it well suited to most scientific applications. It can provide very fast performance on integer values, eliminating the performance justification for a separate int type and avoiding the terrible errors than can result from int truncation.

DEC64 is intended to be the only number type in the next generation of application programming languages.

DEC64 represents numbers as 64 bit values composed of 2 two’s complement components: a 56 bit coefficient and an 8 bit exponent. The coefficient is in the high order end, and the exponent is in the low order end.

63 87 0
coefficient exponent

The coefficient is an integer in the range -36_028_797_018_963_968 thru 36_028_797_018_963_967. The exponent is in the range -127 thru 127. Numbers may not use an exponent of -128. The value of a number is obtained from this formula:

value = coefficient * 10exponent

Normalization is not required, and is usually not desired. Integers can have an exponent of 0 as long as the coefficient is less than 36 quadrillion. Addition of numbers with equal exponents could be performed in a single machine cycle.

There are 255 possible representations of zero. They are all considered to be equal.

There is a special value called nan that has a coefficient of 0 and an exponent of -128. The result of division by zero is nan. nan is also the result of operations that produce results that are too large to be represented. nan is equal to itself.

When an arithmetic operation has an input with an exponent of -128, the result will be nan. Applications are free to use the coefficient as they wish when the exponent is -128, since in that case the coefficient has no arithmetic significance. One possible use is to store object pointers in the coefficient.

DEC64 can be implemented efficiently in hardware or software.

Conversion to and from textual representations is simple and straightforward and free of the complexities that binary floating formats must wrestle with to minimize the inevitable errors caused by the fundamental incompatibility of the binary and decimal systems. DEC64 instead uses an internal representation that is very compatible with the e notation.

To convert an int to DEC64, shift it left 8 bits. To unpack a coefficient, shift it right 8 bits with sign extension. The exponent can be unpacked at no cost on x64 architecture because the least significant byte can be accessed directly.

There is a fast path for addition of integers in a software implementation that takes only 5 instructions (7 on RISC-V) whilst also providing for not-a-number and overflow protection.

x64:

; Add rdx to rax.

    mov     cl,al              ; load the exponent of rax into cl
    or      cl,dl              ; 'or' the two exponents together
    jnz     slow_path          ; if both exponents are zero, take the fast path
    add     rax,rdx            ; add the coefficients together
    jo      overflow           ; if there was no overflow, we are done

ARM64:

; Add x1 to x0.

    orr     x2, x0, x1         ; 'or' the two numbers together
    ands    xzr, x2, #255      ; examine the exponent part
    b.ne    slow_path          ; if both exponents are zero, take the fast path
    adds    x0, x0, x1         ; add the coefficients together
    b.vs    overflow           ; if there was no overflow, we are done

RISC-V64:

; Add x11 to x10.

    or      x12, x10, x11      ; 'or' the two numbers together
    andi    x12, x12, 255      ; isolate the exponent part
    bne     x12, x0, slow_path ; if both exponents are zero, take the fast path
    slti    x12, x10, 0        ; x12 is 1 if augend is negative
    add     x10, x10, x11      ; add the coefficients together
    slt     x13, x10, x11      ; x13 is 1 if the sum is less than the addend
    bne     x12, x13, overflow ; if there was no overflow, we are done

The fast path for addition in a hardware implementation should take only 1 cycle when the two exponents are equal to each other and there is no overflow. The fast path for multiplication in hardware takes the time it takes to do a 56*56 bit signed multiply when there is no overflow.

A reference implementation is available for Intel/AMD x64 and ARM64 at https://github.com/douglascrockford/DEC64. It provides the DEC64 elementary operations. Vadim Pisarevsky has prepared a C++ version that can be found at https://github.com/vpisarev/DEC64/tree/alt.

Conversion between DEC64 and strings is trivially easy. This is demonstrated by dec64.string.

The elementary functions (sine, log, sqrt, etc) are demonstrated by dec64.math.

Motivation

The idea of using powers of ten instead of powers of two is not new. For example,

Floating point subroutines and interpretive systems for early machines were coded by D. J. Wheeler and others, and the first publication of such routines was in The Preparation of Programs for an Electronic Digital Computer by Wilkes, Wheeler, and Gill (Reading, Mass.: Addison-Wesley, 1951), subroutines A1-A11, pages 35-37 and 105-117. It is interesting to note that floating decimal subroutines are described here, although a binary computer was being used; in other words, the numbers were represented as 10ef, not 2ef, and therefore the scaling operations required multiplication or division by 10.

The Art of Computer Programming, Volume 2: Seminumerical Algorithms, Third Edition by Donald Knuth (Addison-Wesley, 1998), page 226.

The book Knuth cited may have been the first software book. It described some of the libraries and conventions of Maurice Wilkes’s EDSAC, one of the first generation of Von Neumann machines. Some of its subroutines used a numeric format that was very similar to DEC64.

Floating point was so important that support for it was moved into hardware for better performance. This led to the development of binary floating point because a shift could be implemented much more easily than a divide by 10. It was discovered that by biasing the exponent and moving it to the position just after the sign bit that floating point numbers could be compared with integer opcodes, a nifty optimization. It was also discovered that because normalization always left a 1 bit in the most significant position of the significand, that that bit could be omitted, providing an additional bit of significance.

The Burroughs 5000 series had a floating point format in which an exponent of zero allowed the mantissa to be treated as an ordinary integer. DEC64 incorporates that brilliant idea.

Languages for scientific computing like FORTRAN provided multiple floating point types such as REAL and DOUBLE PRECISION as well as INTEGER, often also in multiple sizes. This was to allow programmers to reduce program size and running time. This convention was adopted by later languages like C and Java. In modern systems, this sort of memory saving is pointless. By giving programmers a choice of number types, programmers are required to waste their time making choices that don’t matter. Even worse, making a bad choice can lead to a loss of accuracy or destructive bugs. This is a bad practice that is very deeply ingrained.

Binary floating point trades away familiarity and decimal compatibility for performance. This made it unsuitable for business languages like COBOL. Decimal fractions cannot be represented accurately in binary floating point, which is a problem for programs that interact with humans, and is dangerous in programs that manipulate money. Exactness is required, so most business processing used BCD (Binary Coded Decimal) in which each digit is encoded in 4 bits. That created some inefficiency, but benefited from allowing a shift by 4 bits in place of the more complex divide by 10. For a time, mainframes could be ordered with optional floating point units for scientific computing, and optional BCD units for business computing.

The BASIC language eliminated much of the complexity of FORTRAN by having a single number type. This simplified the programming model and avoided a class of errors caused by selection of the wrong type. The efficiencies that could have gained from having numerous number types proved to be insignificant.

Business Basic was a dialect of BASIC that was developed by Basic/Four Corporation for its small business minicomputers. It used decimal floating point, much like the EDSAC, so the language could be used for both scientific and business applications. Business Basic could do everything that BASIC could do, and it could also handle money.

dec64 elementary operators
dec64.math
dec64.string
Deconstruct
DEC64 on Github
联系我们 contact @ memedata.com