平庸工程师的 HTTPS 指南

平庸工程师的 HTTPS 指南
Mediocre Engineer's Guide to HTTPS

原始链接: https://devonperoutky.super.site/blog-posts/mediocre-engineers-guide-to-https

互联网是互连计算机系统的全球网络。它的名字来源于它连接不同网络的功能。互联网作为一个分散的系统，数据通过多个连接以称为数据包的小单位进行传输，确保冗余路径以实现最佳传输。通信过程的每一层都提供独特的功能，可以在保持功能的同时灵活替换底层协议。 HTTP 请求从应用程序层的客户端（通常是 Web 浏览器）开始。该请求由包含 HTTP 方法、请求的资源和协议版本的初始行组成。以下几行包含标头和可选的消息正文。例如，对 index.html 文件的 GET 请求如下所示：GET /index.html HTTP/1.1。在到达目标服务器之前，必须发生几个阶段： 1. **域名系统** (**DNS**) 解析：此步骤将人类可读的域名（例如 example.com）转换为数字 IP 地址（例如 93.184.216.34）。客户端向 DNS 服务器发出查询，DNS 服务器根据缓存结果或联系更高级别的 DNS 机构来查找并返回所需的 IP 地址。 2. **TCP握手**：客户端获得服务器IP地址后，与远程主机建立可靠的TCP连接。通过涉及一系列消息的同步过程，双方就基本元素达成一致，例如序列号和窗口大小。一旦建立，客户端就会发送完整的 HTTP 请求。 3. 跨网络数据传输：建立连接后，客户端通过众多互连网络发送请求，这些网络将数据包路由到目标网络。各种技术，包括地址解析协议 (ARP)、互联网控制消息协议 (ICMP) 和路由算法，有助于在分布式网络基础设施之间准确传送数据包。

尝试访问网站时，各种因素都会影响请求的成功或失败。其中包括本地计算机、WiFi 连接、电缆连接、互联网服务提供商 (ISP) 或网站本身的问题。由于复杂的网络路由过程，区分此类问题的根源可能具有挑战性。从用户设备到目标网站的网络路径涉及多个跃点，包括本地网络、ISP 和公共互联网。确定问题的根本原因需要了解这个复杂的系统，因此很难开发可靠的诊断工具。误解可能会导致错误的结论并浪费尝试解决问题的努力。为了帮助诊断潜在问题，mtr (My TraceRoute) 等实用程序可以深入了解网络性能并识别路线上的拥塞点或严重延迟点。通过分析结果，故障排除人员可以缩小其网络内或服务提供商的问题位置范围。此外，ICMP 响应可以提供有关特定跃点的网络连接的详细信息，从而提供问题根源的线索。然而，解释这些结果需要网络概念方面的专业知识和经验。尽管查明网络问题的根源存在挑战，但持续的研究和开发不断改进诊断工具和技术，增强我们有效管理和维护网络基础设施的能力。

原文

As a mediocre engineer, I took Internet and HTTPS communication for granted and never dove any deeper. Today we’re improving as engineers and learning a rough overview of how internet communication works, specifically focusing on HTTP and TLS.

The Internet is “just” a network of interconnected computer networks. The term "Internet" literally means "between networks." It operates as a packet-switched mesh network with best-effort delivery, meaning there are no guarantees on whether a packet will be delivered or how long it will take. The reason why the internet appears to operate so smoothly (at least from a technical perspective) is the layers of abstraction that handle retries, ordering, deduplication, security and so many other things behind the scenes. Letting us developers just focus on the application layer (aka. Writing HTTP requests from San Francisco for $300K/year).

Each layer provides certain functionalities, which can be fulfilled by different protocols. Such modularization makes it possible to replace the protocol on one layer without affecting the protocols on the other layers.

Here’s a simple table of the layers.

We’ll go over these layers more in-depth layer, but first, let’s see this in action.

Here is the path of an HTTP request through these layers (Skipping physical layer for brevity).

1. Sender Makes a Request

The process begins at the Application layer, where the client (usually a web browser) constructs an HTTP request. HTTP is a text-based protocol, meaning that all this data is sent as plain text over the wire.

The first line typically includes:

HTTP method (GET, POST, etc)
Requested Resource (Example: /index.html )
Protocol version.

The remainder of the HTTP message contains headers in a key: value format an an optional message body.

Example: HTTP Request

GET /index.html HTTP/1.1
Host: www.example.com
Accept: text/html
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36

2. DNS Lookup:

The Domain Name System (DNS) translates the human-readable domain name (www.example.com) into an IP address (e.g., 93.184.216.34). The client queries DNS servers to resolve the domain name to its corresponding IP address. This process goes through multiple resolvers until it reaches the authoritative server which does the conversion of domain name to IP address. At a very high level, the three components are

Stub resolvers, which lives on the client machine and routes the request to the appropriate recursive resolver (explained next)
Recursive resolvers, which receives requests from the stub resolver and queries authoritative servers to resolve the domain name - often caching the result. Your Internet Service Provider (ISP) typically provides a recursive resolver, or you may use a public one like Google DNS (8.8.8.8).
Authoritative servers which contain the actual DNS records (like A, MX, CNAME, etc.) for a domain and responds to queries with the information in those records. Authoritative servers are the final source of truth for domain name data.

When a client issues a request for a resource using a domain name, the stub resolver on your computer sends a query to a recursive resolver to resolve the domain name.

The recursive resolver, queries authoritative DNS servers as needed to resolve the domain name to an IP address.

3. TCP Handshake:

Now that we have the IP address of the server, the client can begin transmitting the HTTP and we move to the Transport Layer. There are two primary protocols for the transport layer, TCP (Transmission Control Protocol) and UDP (User Datagram Protocol).

💡

TCP is a connection-oriented protocol that ensures reliable, ordered, and error-checked data delivery between applications.

UDP is a connectionless protocol that provides fast, low-overhead data transmission without guaranteeing delivery, order, or error checking.

As of 2024, TCP is the main protocol for managing data transport across the internet, while UDP is less commonly used, typically for real-time applications like streaming or video calls, where low latency is crucial and occasional packet loss is acceptable. Now back to the topic at all.

Once the client has obtained a the IP address, it initiates a TCP connection with the server on port 80 (the standard port for HTTP). This involves a three-step handshake:

SYN: The client sends a SYN (synchronize) packet to the server to request a connection.
SYN-ACK: The server responds with a SYN-ACK (synchronize-acknowledge) packet to acknowledge the request.
ACK: The client sends an ACK (acknowledge) packet back to the server, establishing a reliable connection.

4. Transmit HTTP Request

With the TCP connection in place, the client sends the actual HTTP request. As mentioned, HTTP is a text-based protocol, so the request headers and the body (if any) are sent as plain text.

5. Packets routed across Internet to Server

⚠️⚠️⚠️⚠️⚠️ We’re going deep here ⚠️⚠️⚠️⚠️⚠️

When a client sends a request, the data packets don't travel directly to the server. Instead, they follow a path through various network devices, primarily routers, which determine the best route for the packets to reach the server network gateway. From there, the link layer comes into play.

Step-by-step explanation of how text makes it across the internet

Initial Transmission:

The client's device encapsulates the HTTP request data into TCP segments and then into IP packets. These packets are further encapsulated into smaller chunks, referred to as frames, suitable for the Link Layer (e.g., Ethernet frames if using a wired connection).

Local Network:

The frames are transmitted over the local network to the client's router. The Link Layer handles the communication within this local network, ensuring the frames reach the router.

Local Router Processing:

The router receives the frames, strips off the Link Layer headers, and processes the IP packets. The router examines the destination IP address in the packets and determines the next hop on the path to the server.

Routing Across Networks:

The router forwards the packets to the next network, often through one or more intermediary routers. Each intermediary router repeats the process: receiving the packets, determining thenext hop, and forwarding them.

Final Network

Eventually, the packets reach a router on the same network as the destination server. This router performs the final routing decision and sends the packets to the appropriate local device (the server).

Server Reception:

The server's router forwards the packets over the local network segment to the server. The Link Layer ensures the frames are correctly transmitted to the server's network interface. (It has been doing that for every machine → machine communication for this whole time.

Server Processing:

The server receives the frames, extracts the IP packets, and processes the encapsulated TCP segments to reconstruct the original HTTP request. The server then generates an HTTP response and the process reverses to send the response back to the client.

⁉️

The process of sending packets across the internet (The Network Layer) is used for essentially all communication over the internet. So it was used for all the steps earlier (like resolving the domain name, the TCP handshake, etc) however there’s only so much that can be explained at once.

6. Server Response

The server receives the HTTP request and processes it. After processing the request, the server sends an HTTP response back to the client. The response includes:

Protocol (The HTTP version being used)
Status information (The HTML Status code like 200, 404, etc)
Response headers (Like Request Header but Response)
Requested content/Body (The actual content, such as HTML of the request page or JSON data)

HTTP/1.1 200 OK
Date: Sat, 26 May 2023 10:00:00 GMT
Server: Apache/2.4.41 (Ubuntu)
Content-Type: text/html
Content-Length: 3456

<!DOCTYPE html>
<html>
<head>
    <title>Example Page</title>
</head>
<body>
    <h1>Hello, world!</h1>
</body>
</html>

You may have seen something like this when debugging requests.

7. Content Rendering:

The client receives the HTTP response and processes it. The browser interprets the HTML and renders the content on the screen. If the response includes additional resources (e.g., images, CSS, JavaScript), the browser will make further HTTP requests to fetch these resources, following the same process.

So now that we’ve gotten a basic HTTP request out of the way, there’s only one problem. It’s not secure at all. Anyone listening on the connection can view 100% of the data being passed back-and-forth. Additionally, someone could pretend to be a server such that the client is tricked into sending valuable information. That’s where the Security Layer comes into play

Little Layer Review

While we’re here, let’s do a brief review of the layers and their purpose, while we introduce the Security Layer.

Application Layer: Where applications create and communicate user data. This is what you have interacted the most with. Uses transport layer services for reliable or unreliable data transmission. Protocols include HTTP, FTP, SSH, SMTP. Uses ports to address processes/services.
Security Layer: Ensures secure communication by providing encryption, authentication, and data integrity. Common protocols include TLS (Transport Layer Security) and its predecessor SSL (Secure Sockets Layer). This layer protects data in transit and verifies the identity of the communicating parties.
Transport Layer: Manages host-to-host communications, providing channels for application data. Includes:

UDP: Unreliable, connectionless datagram service.
TCP: Reliable, connection-oriented service with flow control and connection establishment.

Network Layer: Responsible for exchanging packets across network boundaries via routing the packets through various intermediate routers. Primary protocol: Internet Protocol (IP).
Link Layer: Manages local network communications without routers. Defines local network topology and interfaces for transmitting datagrams to neighboring hosts.

Specifically pay attention to the Security Layer, as that layer is the defining difference between an HTTP request (which we just covered) and an HTTPS request (~86% of the current internet and growing).

HTTPS is HTTP with encryption and verification. While there are multiple ways of securing HTTP communication over the internet, the current implementation everyone uses is Transport Layer Security (TLS).

TLS is how the client and server can verify each other identities and ensure all the payloads are encrypted in a way both parties will be able to decrypt them. The TLS handshake process, specifically, determines how the client and server will exchange encryption and verification keys. Once the keys have been exchanged, the client and server will communicate using HTTP as normal, and use the keys to encrypt and verify messages.

The flow of an HTTPS is the exact same as the HTTP request we covered previously, with the addition of a Security Layer in between the Application Layer and the Transport Layer (although typically TCP is used for the TLS handshake).

TLS Handshake

The TLS handshake is for the client and server to agree on a few different aspects of the communication. Specifically, the collection of algorithms that will be used for verifying, compressing, and encrypting messages.

🔒

This collection of algorithms are referred to as cipher suites. To be specific all of them except the compression algorithm are considered the cipher suite, but for brevity I’ll refer to the full collection of them the cipher suite going forward.

By agreeing on all these algorithms, exchanging random seeds, and the server’s SSL certificate containing the private key; the client and server can generate a symmetric key that will be used to encrypt and verify the messages being passed back and forth. This process of agreeing on cipher suites and distributing the necessary information (seeds and SSL cert) is referred to as the TLS handshake.

Note: All communication happens over TCP, the blue steps indicate the TCP handshake and the yellow steps are TLS handshake .

TLS Handshake

Client Hello

The client will send a “Client Hello”, which is an TCP message to the server specifying the cipher suites it supports, as well as the supported TLS version and a random number (called the Client Random)

Server Hello

The server will respond with a “Server Hello” which is a TCP message containing the chosen TLS version, the chosen cipher suite algorithms, and it’s own random number (the Server Random)

Certificate Verification

The client verifies the server’s SSL certificate with the Certificate Authority and retrieves the server’s public key.

Premaster Secret Generation

The client generates a premaster secret, encrypts it with the server’s public key, and sends it to the server.

Decryption

The server decrypts the premaster secret using its private key.

Session Key Creation

Both client and server use the client random, server random, and premaster secret to create session keys.

Client Ready

The client sends a "finished" message encrypted with a session key.

Server Ready

The server sends a "finished" message encrypted with a session key.

Secure HTTP Communication

The session keys are used for secure symmetric encryption, ensuring both parties can now communicate securely.

Boom. That’s the TLS handshake, except for one more thing, and that is….

Everything you’ve learned here is a lie.

The process we just describe is for the original version of TLS, which is outdated compared to the more modern version of TLS 1.3.

The process we just went through is a little outdated, but it’s a great place to start due to it introducing the necessary concepts of what needs to be agreed upon for secure server <> client communication.

Current version of TLS (>1.3) do not support RSA (and various other cipher suites) for security reasons. The newer versions are more opinionated, allow significantly fewer options, which makes them simpler, more secure, and faster. However, the components and concepts are all very much the same. You still have an TLS handshake process that agrees on the compression method, the server-authentication, and key exchange in the pursuit of generating a symmetric encryption key for securing the data of the packets being exchanged via TCP.

TLS 1.3 does not support RSA, nor other cipher suites and parameters that are vulnerable to attack. It also shortens the TLS handshake, making a TLS 1.3 handshake both faster and more secure.

The basic steps of a TLS 1.3 handshake are:

Client hello: The client sends a client hello message with the protocol version, the client random, and a list of cipher suites. Because support for insecure cipher suites has been removed from TLS 1.3, the number of possible cipher suites is vastly reduced. The client hello also includes the parameters that will be used for calculating the premaster secret. Essentially, the client is assuming that it knows the server’s preferred key exchange method (which, due to the simplified list of cipher suites, it probably does). This cuts down the overall length of the handshake — one of the important differences between TLS 1.3 handshakes and TLS 1.0, 1.1, and 1.2 handshakes.
Server generates master secret: At this point, the server has received the client random and the client's parameters and cipher suites. It already has the server random, since it can generate that on its own. Therefore, the server can create the master secret.
Server hello and "Finished": The server hello includes the server’s certificate, digital signature, server random, and chosen cipher suite. Because it already has the master secret, it also sends a "Finished" message.
Final steps and client "Finished": Client verifies signature and certificate, generates master secret, and sends "Finished" message.
Secure symmetric encryption achieved

There you go. Go out and ace your technical interviews now.

If you want to read more posts like these, you can subscribe.

In addition to writing mediocre technical blog posts, I also offer consultancy services and run a development agency. I have built a lot of things, including

…an RAG AI chatbot and search tool for corporate knowledge bases - acquired by Brex

…distributed Python and Scala services at Twilio and Valon

…award-winning Military Recall App chosen by SAIC for the US Department of Defense

I’ve also helped lead teams at some of these elite startups. If you are looking for software development services or consultation for a project, I might be able to help. Feel free to reach out at [email protected].