``` C 语言中解析整数 ```
Parsing Integers in C

原始链接: https://daniel.haxx.se/blog/2025/11/13/parsing-integers-in-c/

标准C库函数,如`atoi`和`strtol`,在将字符串转换为整数时经常会出现问题,因为它们过于宽松——会静默接受无效输入,缺乏强大的错误处理(例如溢出检测),并且存在平台相关的尺寸限制(尤其是在使用`long`时)。这可能导致安全漏洞和不可靠的解析。 curl项目通过开发自己的解析函数解决了这些问题,特别是`curlx_str_number()`。该函数优先考虑严格性:它要求精确的输入,强制执行最大值限制,检测溢出,并且*始终*使用64位整数。它不允许前导空格和显式符号(+/-),强制解析代码在需要时显式处理它们。 虽然可能稍慢一些,但`curlx_str_number()`在代码清晰度、安全性和可靠性方面提供了显著的优势。截至2025年11月,curl已完全移除了所有较弱的标准函数(`atoi`、`strtol`变体)的使用,完全采用其自定义的、更严格的解析方法,以确保强大的数据解释。这些`curlx`函数在libcurl库和命令行工具之间共享,减少了代码重复。

一个黑客新闻的讨论围绕着“鲁棒性原则”——系统应该能够容忍意外输入。讨论是由一篇关于C语言中整数解析的博文以及cURL项目决定优先严格遵守规范,而非宽容地接受各种输入所引发的。 许多评论者*反对*鲁棒性原则,尤其是在安全方面。他们认为,接受任何非完美格式的数据都可能导致漏洞和不可预测的系统交互。一个关键点是,虽然在早期互联网中很有用,但现在已经不那么必要了,清晰、严格的输入验证更可取。 另一些人指出,该原则在防止崩溃方面是“局部最优的”,但对可靠的生态系统整体上是有害的。 几位强调了盲目接受输入以及明确定义规范的重要性。 建议使用C++中的`std::from_chars`等替代方案,以实现高效、严格的解析。 讨论还涉及C语言`strtoul`函数的怪癖以及处理有符号整数为无符号值时的复杂性。
相关文章

原文

In the standard libc API set there are multiple functions provided that do ASCII numbers to integer conversions.

They are handy and easy to use, but also error-prone and quite lenient in what they accept and silently just swallow.

atoi

atoi() is perhaps the most common and basic one. It converts from a string to signed integer. There is also the companion atol() which instead converts to a long.

Some problems these have include that they return 0 instead of an error, that they have no checks for under or overflow and in the atol() case there’s this challenge that long has different sizes on different platforms. So neither of them can reliably be used for 64-bit numbers. They also don’t say where the number ended.

Using these functions opens up your parser to not detect and handle errors or weird input. We write better and stricter parser when we avoid these functions.

strtol

This function, along with its siblings strtoul() and strtoll() etc, is more capable. They have overflow detection and they can detect errors – like if there is no digit at all to parse.

However, these functions as well too happily swallow leading whitespace and they allow a + or – in front of the number. The long versions of these functions have the problem that long is not universally 64-bit and the long long version has the problem that it is not universally available.

The overflow and underflow detection with these function is quite quirky, involves errno and forces us to spend multiple extra lines of conditions on every invoke just to be sure we catch those.

curl code

I think we in the curl project as well as more or less the entire world has learned through the years that it is usually better to be strict when parsing protocols and data, rather than be lenient and try to accept many things and guess what it otherwise maybe meant.

As a direct result of this we make sure that curl parses and interprets data exactly as that data is meant to look and we error out as soon as we detect the data to be wrong. For security and for solid functionality, providing syntactically incorrect data is not accepted.

This also implies that all number parsing has to be exact, handle overflows and maximum allowed values correctly and conveniently and errors must be detected. It always supports up to 64-bit numbers.

strparse

I have previously blogged about how we have implemented our own set of parsing function in curl, and these also include number parsing.

curlx_str_number() is the most commonly used of the ones we have created. It parses a string and stores the value in a 64-bit variable (which in curl code is always present and always 64-bit). It also has a max value argument so that it returns error if too large. And it of course also errors out on overflows etc.

This function of ours does not allow any leading whitespace and certainly no prefixing pluses or minuses. If they should be allowed, the surrounding parsing code needs to explicitly allow them.

The curlx_str_number function is most probably a little slower that the functions it replaces, but I don’t think the difference is huge and the convenience and the added strictness is much welcomed. We write better code and parsers this way. More secure. (curlx_str number source code)

History

As of yesterday, November 12 2025 all of those weak functions calls have been wiped out from the curl source code. The drop seen in early 2025 was when we got rid of all strtrol() variations. Yesterday we finally got rid of the last atoi() calls.

(Daily updated version of the graph.)

curlx

The function mentioned above uses a ‘curlx’ prefix. We use this prefix in curl code for functions that exist in libcurl source code but that be used by the curl tool as well – sharing the same code without them being offered by the libcurl API.

A thing we do to reduce code duplication and share code between the library and the command line tool.

联系我们 contact @ memedata.com