（评论）

（评论）
(comments)

原始链接: https://news.ycombinator.com/item?id=40089609

本文讨论多路径 TCP (MPTCP)，这是一项旨在通过启用单个客户端到服务器的多个连接来改善互联网连接的技术，从而增加整体带宽。作者提到了他们在德国和瑞士等欧洲国家的个人经历，这些国家的某些电信公司提供更好的覆盖范围，但成本较高。他们指出，在大型负载平衡系统中支持 MPTCP 会带来挑战，因为两个客户端流一致分配到同一 MPTCP 终止点以及更高带宽流上的潜在争用等问题。文本表明 MPTCP 有可能解决实际用户问题，特别是与网络不稳定和拥塞相关的问题。作者参考了 Google 等科技巨头在 Quic 和 QUIC 连接迁移中实施的类似做法。他们的结论是 MPTCP 可能是有益的，但其广泛采用面临着障碍，包括在大型负载平衡情况下实施它的复杂性以及中间件和负载平衡器供应商不愿意采用它。

I've heard about MPTCP back in 2013.

It made so much sense back then, when mobile apps were not that robust to networks changing, I assumed it's going to get adopted in no time due to how much of a ux improvement it would have been back in the day.

It's incredibly depressing that this gained barely any traction in the last 10 years, and kernel options are appearing just recently, after everyone has wrapped they http calls in multiple retry handlers, and mobile operating systems have abstracted network connectivity to the point where it feels more like you are using zeromq rather than tcp.

I wanted to like it, and Apple included it in iOS, but supporting it on real servers was going to be too hard...

When I was deployed on FreeBSD with no load balancers, there weren't recent patches. And even if there were, I'd need to do some serious work to avoid advertising the private network ips as alternates...

When I was on Linux behind a load balancer, it's too complex to get the streams to the right place. And the load balancer doesn't want to do it anyway.

Processing two streams together involves a lot of complexity in a high throughput code path. It's a lot of risk, and you've got to reboot for changes.

And then you do all that work and it only benefits iOS users, who tend to be on better networks anyway.

> A U.S. analysis of Wi-Fi and mobile Internet usage across unique smartphones on the iOS and Android platforms reveals that 71 percent of all unique iPhones used both mobile and Wi-Fi networks to connect to the Internet, while only 32 percent of unique Android mobile phones used both types of connections. A further analysis of this pattern of behavior in the U.K. shows consistent results, as 87 percent of unique iPhones used both mobile and Wi-Fi networks for web access compared to a lower 57 percent of Android phones.

https://www.comscore.com/lat/Prensa-y-Eventos/Infographics/i...

Lol, have you ever been to Europe? iPhones are definitely considered premium and there definitely are networks that are more expensive but offer better reception. In Germany, that would be Telekom, in Switzerland, it's Swisscom.

Yes used to live in Germany. I was talking mainly about the US though.

iPhone isn't always 'premium', since they have their version of cheap phones as well. Point is cell network service quality is independent from phone quality.

It sounds like this would have taken off if it were added to various managed cloud load balancers based on what you're saying.

The only question I have is if it opens up a different can of worms even if you've got a magic box terminating layer 7 for you or not. Never dug deep enough into mptcp myself to know.

I think it's a no brainer if it's no effort or small effort (set a socket option on the client, somehow)... but it's a big effort to support it in a large load balancing situation.

If you balance your load balancers with ECMP, I don't know if you can get two client streams to the same mptcp terminating place.

If you've optimized the heck out of your tcp flows, this throws a wrench in there, because the second stream is likely to get hashed into a different nic queue, and then you have communication between cpus to move forward on the logical stream.

It would have been really handy though, and solve real issues with real users.

Edit to add: it could also solve some issues on private networking / interserver networking I saw... although the contention would be a much bigger problem on higher bandwidth streams. On networks with link aggregation, while there are many paths from one host to another, usually path selection is by hashing the connection 5-tuple {src ip, dst ip, protocol, src port, dst port} so a long running tcp connection remains on the same path for the duration, if a path segment has high loss/corruption or is congested, MPTCP could help if you had an extra connection that hit a different path. Otherwise, you need to find the segment and get network operations to fix it; it's not easy to figure that out (i had to write a tool to sample and find port combinations with trouble and then a patch for mtr to run a trace with fixed ports) and then you still need to reconnect your affected tcp sockets unless you can get a quick response from net ops (sometimes they can check error stats once the right devices are pointed out to them, and then replacing a cable/fiber often helps, or disconnecting it during investigation can help the traffic flow across the redundant links)

> If you balance your load balancers with ECMP, I don't know if you can get two client streams to the same mptcp terminating place.

At Google, we do something similar with QUIC and connection migration. Our mechanism for ensuring these hit the same backend is Maglev [0], where we use the QUIC connection ID for hashing purposes in software. (Our routers still mostly use ECMP based on the 5-tuple, so being able to consistently hash to the same backend across multiple LB instances is crucial.)

> if a path segment has high loss/corruption or is congested, MPTCP could help if you had an extra connection that hit a different path.

Incidentally, we also have a family of internal mechanisms that do this, although we don't rely on MPTCP. (We instead twiddle some other bits in the packet that we make sure our routers use for hashing, at least for RPCs between prod machines.) This inspired some of the connection migration work in our QUIC implementation [1], wherein we can migrate to a different ephemeral port if we detect issues with the current path. This works shockingly often for routing around network problems.

[0] https://research.google/pubs/maglev-a-fast-and-reliable-soft...

[1] https://github.com/google/quiche/blob/main/quiche/quic/core/...

> You might also be interested in SCTP[1] from the year 2000, which also hasn't gotten any traction so far.

Probably partly because middleware boxes (e.g., firewalls) either didn't/don't support it and/or rules were written to only support "TCP" (as opposed to 'stream') or "UDP" (as opposed to 'dgram'; see also "DCCP").

Certainly that's a part, but it didn't help that SCTP has some fundamental low-level flaws.

Given that TCP also has at least one unfixable flaw, the only recommendation I can make is to use something UDP-based - which, to make sure you don't stomp on everybody else's traffic, means use the only popular one: QUIC (the layer beneath HTTP/3).

The protocol is specified by a byte in the IP packet; how many middleware boxes block everything except for ICMP, TCP, and UDP? What is the probability that a packet with that byte set to something unexpected actually gets from source to destination?

> The protocol is specified by a byte in the IP packet; how many middleware boxes block everything except for ICMP, TCP, and UDP?

Most firewalls are default deny out of the box and you have to allow things through. How many folks bother opening up SCTP/DCCP/etc?

SCTP can run over UDP. It's part of the spec.

Now we have HTTP3 which runs over UDP - where there is a will, there is a way.

Perhaps SCTP was ahead of its time.

The “funny” thing is that http3 really really looks like a transport protocol encapsulated into… uso. Exactly because many middle boxes block anything that’s not a very well known protocol

I see it as depressing that this is gaining traction it doesn't deserve. TCP doesn't need one hack at a time and then to make us choose combinations that sort of work in half the use cases in the modern world, it needs to be replaced with SCTP.

I was excited about it because we were working on delivery robots and I wanted a good solution for instant failover given 2 cellular modems.

We ended up going with PepLink's SpeedFusion to save engineering time. But the license was costly. I really hope for a free solution in the future for 2 cellular networks and <50ms failover.

Multipath UDP + OpenVPN would also probably be a viable solution.

I created something like what you're describing with the addition of P2P communication using NAT traversal (https://www.hyperpath.ie)

It will connect your devices in a P2P Mesh VPN and allow them to send and receive data using multiple links (e.g. multiple 5G or 5G + Satellite).

It is significantly cheaper than Peplink's license, less latency and no bandwidth / data limits.

You need to bring your own hardware though. Like a Raspberry Pi with 3 USB 4G/5G dongles.

Hehe, I also worked on a delivery robot with exactly the same problem. We ended up licencing phantom auto. Expensive and ... Not particularly amazing.

I don't know which makes me sadder-- IPv4 only having a 32-bit address space or TCP using the source and destination IP addresses in the connection tuple. That's one of those "if I had a time machine" of things-- I'd go back and have Cert and Kahn change both of those items.

If TCP had a protocol specific identifier for connections (a couple of 32-bit values, for example-- a client nonce and server nonce) rather than using the source/destination IP addresses multi-homed hosts and seamless transition between different networks would become native features of the protocol. A client could roam between two different IP networks and TCP connections would "survive", for example. (I'm oversimplifying nearly to the point of hyperbole, to be sure...)

(Another fun future would have been one where SCTP got widespread adoption.)

a client nonce and server nonce) rather than using the source/destination IP addresses multi-homed hosts and seamless transition between different networks would become native features of the protocol. A client could roam between two different IP networks and TCP connections would "survive", for example.

This is mostly how Mosh [1] works and allows for IP roaming, changing IP's, etc... without losing ones SSH session. The connection can even be interrupted for a prolonged period of time and restore on its own on a new IP seamlessly.

[1] - https://mosh.org/

How would routing be done without source/destination? When the device changes networks, how does the origin and all routers along the way know that this device is on a new network?

> How would routing be done without source/destination?

There is still a source/destination address. Routing still works. But those addresses are allowed to change without disrupting the connection because the connection isn't based on the values of these addresses.

> When the device changes networks, how does the origin and all routers along the way know

The routers don't need to "know" these things.

MPQUIC does this. To the network it's just UDP packets moving around. Connection state is dealt with at higher levels and doesn't rely on IP addresses.

> how does the origin and all routers along the way

It's just the origin that needs to know what address(es) it should be using as the destination at layer 3.

The big problems with this is that it depends upon things that weren't really feasible in the early 80's -- bigger packet headers, a bit more state on each side of the connection, potential need for cryptographic authentication.

There's two separate ideas here:

* Where to send a frame to get to the other side of the connection

* Whose connection this is.

TCP combined the two, because we didn't have mobile clients or a lot of multihomed systems that would benefit from distinguishing them. Also, every octet in the header counted.

In practice, this means we have to keep building a lot of infrastructure on top of TCP (or parallel to it, in datagram protocols) to handle retries and splitting flows well. In turn, these things are completely opaque to the network and it's difficult to write rules about them.

Whereas if we had different packet fields for "where am I sending this packet right now" and "whose flow does this belong to"? we could write better firewall rules, have less infrastructure built on top of TCP, and have better typical application performance.

But the stuff that carries TCP is IP. That's why TCP can work seamlessly, because it uses identification from a previous layer. Consider I bind a server to an ID, and not IP:port, the operating system running it must know how to communicate that via IP, so there will be a corellation map somewhere and that map needs to be synchronized between all peers that wish to host the roaming server.

Otherwise you're just switching port (16-bit) value to arbitrary 32-bit identifier.

If TCP didn't use L3 source and destination addresses to distinguish connections, it could be more easily taught to deal with:

* Clients roaming between L3 addresses

* Clients/servers with multiple L3 addresses

But... it doesn't? TCP has no notion of IP address in the protocol, only the port. TCP with changing IPs can work e.g. on top of an ip-ip tunnel with applications not being aware at all.

The protocol would have to handle binding the network to the transport. MPTCP and SCTP both handle that via registering and un-register network layer endpoints. This parallel universe TCP would be the same in that regard.

(I did say I was oversimplifying...

The problem is that the TCP/IP model stops at level 4, and if we consider TCP a protocol of transport, it shouldn't do that.

In the OSI model what you talk about is level 5, that is session, but in TCP/IP there is no such level, thus it must be handled by the application (e.g. trough a session cookie, in HTTP).

Slavish adherence to theoretical models is a recipe for failure. Even worse, the OSI model was developed in the 1970s before successful internetworks existed so it's not informed by experience; it's mostly made up.

I got fiber run to my neighborhood, and for a while, had a 1gb coax connection and a 1gb fiber connection. I used openmptcprouter to aggregate my connections through a droplet and I effectively had a 2 gigabit internet connection. I would have stuck with it, but having a datacenter IP for your home network really doesn’t work.

Except TCP is just a bad protocol to start with for tunnelling, because packetized data has to be delivered in-order, and head of line blocking messes up congestion control algorithms in the tunnelled data.

The only practical use of MPTCP for me is to use mobile and Wi-Fi network together to boost the speed. iOS and WeChat both support this. However, I always turn them off because my mobile network is metered. So in the end, MPTCP is useless for me *personally*.

I worked on this. We called it the parking lot bug. WiFi still shows signal but no proper connection. With MPTCP, it will failover to cell.

Why does this require explicit opt in by applications if there’s transparent fallback? Wouldn’t it make most sense for the kernel to do it transparently for every TCP connection so that it can make more global decisions about path aggregation / link preference?

My understanding is that it was basically a condition enforced by the maintainers of the Linux TCP / networking subsystems. If you look at the initial upstreaming discussions[1], this was setup as a ground rule.

If you look at the older multipath TCP implementation, prior to the upstreaming, it was intended to be fully transparent to the application, which I think makes more sense for the intent of the protocol. Sure, in many cases MPTCP may be better with application-guided logic, but having a standard system approach (e.g. establish sub-flows on an LTE connection for automatic failover, but don't send any data along those sub-flows) would have worked for 95% of cases.

[1] https://lore.kernel.org/all/alpine.OSX.2.21.1707181728570.11...

Using this implies that there are multiple IPs per endpoint associated with a single TCP connection. That is going to need explicit support/awarness by the application in many cases.

I can imagine new security holes being opened up by allowing multiple IP's to talk over the same TCP connection...

Imagine you have an application which checks the clients IP (eg. against a whitelist) at the time of connection and then assumes it doesn't change...

I work supporting, debugging, fixing the Linux network stack and drivers. I am amazed how little adoption this has seen.

Like everything which came along and tried to supplant regular TCP, such as SCTP, it seems MPTCP has also been confined to a niche of application developers who will use it forever while the rest of the world forgets about it.

Similarly, I promise you everything you've ever done over a 4G or 5G phone has used SCTP inside the phone provider. That doesn't mean the general developer population know about it. I bet most developers have never even heard of it.

I worked on IMS before it was widely rolled out and at least back then its was just SIP over TCP and UDP (RTP/RTCP) for media. SCTP was widely used for eNodeB - MME comms tho iirc

I found [1] which describes the architectural difference between MPTCP and QUIC, and also introduces the authors' proposed MPQUIC protocol:

> QUIC multiplexes application streams on a single UDP ﬂow, whereas MPTCP splits a single stream on multiple TCP subﬂows. MPQUIC combines both features by multiplex- ing application streams on multiple UDP subﬂows.

[1]: "Multipath QUIC: A Deployable Multipath Transport Protocol" https://www.researchgate.net/publication/327122884_Multipath...

Now I'm curious about how these protocols compare in production operation. Anybody have experience with both?

Note that MPQUIC is still being discussed at the IETF. At the last IETF meeting, more changes have been discussed. Unfortunately, that slows down its adoption. https://lwn.net/Articles/964377/

But both tries to achieve the same goal. Technically, you can have a very similar behaviour. MPTCP is implemented in the Linux kernel, while QUIC is on the userspace side.

> If any middlebox in between does not support it, the returned SYN+ACK packet will not contain MPTCP options in the TCP option field.

That sounds .. quite restrictive. Is the only requirement on a middlebox to just forward the MPTCP options as-is?

This can help in security/privacy setting.

for example Great Chinese firewall: if you can split your traffic across multiple uplink channels, the firewall will have a hard time to put them together for enforcement?

The examples given on the page seem to focus on multipath to get to a device over the internet, but I can see this being more likely to work properly without needing to fallback on home networks.

At home/lan we use LACP, VRRP... I mean link aggregation and HA needs are solved time ago.

With multiple ISPs, or on a complex enough LAN, we can use multiple routing tables + weights too.

Also, if the ISP at home can do 10Gbps, 1Gbps, 300 Mbps whatever... I want to be able to use them with a single path, so there is no gain using multiple paths. Eventually, when I have cable+wifi connected at the same time, I use to force one of both, cannot see a reason to prefer using both at the same time.

Maybe the latency thing? Never had that issue at home, but could understand that usage case "just use the network segment with less latency to reach $thing".

> Also, if the ISP at home can do 10Gbps, 1Gbps, 300 Mbps whatever... I want to be able to use them with a single path, so there is no gain using multiple paths. Eventually, when I have cable+wifi connected at the same time, I use to force one of both, cannot see a reason to prefer using both at the same time. >

I don't understand why you would want to be able to use them with a single path. the gain would be being able to aggregate them and have individual tcp streams faster than any one IP connection could handle.

Though personally I think the resilience is more appealing. Not having to have a hard cutover when wifi degrades as I walk away would be nice

Some ISPs in Europe are using MPTCP for people being too far from the street cabinets. Typically, for people in the countryside, with < 50 Mbps. Thanks to a transparent proxy installed in the home gateway, and servers in the ISP's network, they can combine both the fixed and cellular networks, and use the fixed one in priority.

MPTCP can also be very interesting for mobility use-cases, even when one network is used at a time, e.g. switching from WiFi to cellular, or different cellular networks in the train, etc.

We found that most proxies/firewalls (90%+ ? I forget) didn't tamper with it. The largest hurdle was working with load balancer vendors to implement it.

（评论） (comments)

（评论）
(comments)