Rustls服务器端性能
Rustls Server-Side Performance

原始链接: https://www.memorysafety.org/blog/rustls-server-perf/

Rustls是一个内存安全且高性能的TLS实现,最近在性能方面取得了显著提升,尤其是在服务器端高利用率场景下。部分由ISRG资助,Rustls旨在提供一个比广泛使用但容易出现漏洞的OpenSSL更安全的替代方案。 最近的优化重点在于最大限度地减少会话恢复存储对单个连接性能的影响。测试表明,Rustls的性能几乎随可用内核数量线性增长,在高负载下性能下降极小,与BoringSSL类似。 Rustls 0.23.17的关键改进包括用RwLock替换了用于票据加密密钥滚动的互斥锁,减少了密钥轮换期间的竞争。此外,默认发送的票据数量从4张减少到2张,与OpenSSL/BoringSSL保持一致,从而减少了CPU和带宽的使用。 基准测试显示,Rustls的性能具有竞争力,与OpenSSL相比,服务器延迟显著降低,使其成为高性能TLS服务器应用程序的理想选择。

这个Hacker News帖子讨论了关于Rustls服务器端性能改进的文章。评论者普遍持积极态度,强调了Rustls的安全特性和性能提升。一位用户请求提供更详细的基准测试信息,特别是关于握手性能差异以及与OpenSSL版本的比较,并提到了HAProxy最近关于SSL协议栈的发现。另一位用户对Rustls在生产环境中的潜力表示兴奋,强调其与BoringSSL等项目相比,代码库隐式地更安全。还简要讨论了使用crossbeam-epoch进行加密密钥轮换的潜在优化,并附带了HAProxy的“TLS协议栈现状”文章链接,该文章暗示了OpenSSL的衰落以及Rustls C绑定目前尚未准备好用于生产环境。总体而言,人们对Rustls的未来及其在安全性和性能方面的优势持乐观态度。

原文

In past years, the Rustls project has been happy to receive substantial investments from the ISRG. One of our goals has been to improve performance without compromising on safety. We last posted about our performance improvements in October of 2024, and we're back to talk about another round of improvements.

What is Rustls?

Rustls is a memory safe TLS implementation with a focus on performance. It is production ready and used in a wide range of applications. You can read more about its history on Wikipedia.

It comes with a C API and FIPS support so that we can bring both memory safety and performance to a broad range of existing programs. This is important because OpenSSL and its derivatives, widely used across the Internet, have a long history of memory safety vulnerabilities with more being found this year. It's time for the Internet to move away from C-based TLS.

On the server

In our previous post we looked at handshake latency and traffic throughput for connections on the client and the server. While clients will usually have a small number of connections active at any time, TLS servers generally want to optimize for high utilization, supporting as many connections as possible at the same time. TLS server connections usually share a reference to a backing store, which can be used to resume sessions across connections for a substantial latency improvement in connection setup. Our goal is then to minimize the slowdown that sharing the resumption store imposes on individual connections.

We first validated the assumption that turning off resumption would allow linear scaling:

As our testing showed, Rustls manages to avoid any impact from scaling in this case, up to the 80 cores offered by the Ampere ARM hardware used in this test. This is similar to BoringSSL, which shows no impact -- although it spends more time per handshake. OpenSSL handshake latency deteriorates as it scales, although comparing OpenSSL versions shows that its development team have made strides to improve this, as well.

Resumption mechanisms

TLS supports two different resumption strategies:

  • Stateful resumption stores resumption state on the server in some kind of map (or database). The key into this map is sent across the wire. Because the key is relatively compact, this uses less bandwidth and therefore slightly reduces latency. On the other hand, it is harder to scale efficiently when multiple servers are serving the same potentially resuming clients.

  • Stateless resumption sends encrypted resumption state to the client. This is easy to horizontally scale because there is no server-side state, but the resumption state is a good deal larger, with an associated increase in bandwidth used (and the associated latency impact).

The resumption state that is sent to a client is commonly called a "ticket". Ticket encryption keys must be regularly rolled over because a key compromise destroys the security of all past and future tickets. In order to enable key rollover while supporting multiple concurrent sessions, Rustls 0.23.16 and earlier wrapped the encryption key in a mutex, which resulted in substantial contention as the number of concurrent server connection handshakes increased. In Rustls 0.23.17, we started using an RwLock instead, which limits contention to the short period when a key rollover happens (by default, every 6 hours).

Finally, we made another change in Rustls 0.23.17 to reduce the number of tickets sent by default when stateless resumption is enabled from 4 to 2, to align with the OpenSSL/BoringSSL default. This leads to doing less work both in terms of CPU time (encryption) and bandwidth used.

Handshake latency distribution

Apart from specific resumption concerns, we also compared Rustls to other TLS implementations in terms of the latency distribution experienced on the server: not just looking at the average latency, but also at worst-case (in this case, P90 and P99) latency. Rustls does quite well here:

While this chart shows full TLS 1.3 handshakes in particular, similar results were observed for other scenarios.

Conclusion

Current versions of Rustls show competitive performance when processing many connections at the same time on a server. Rustls servers scale almost linearly with the number of cores available, and server latency for the core TLS handshake handling is roughly 2x lower than OpenSSL in our benchmarks.

联系我们 contact @ memedata.com