Bounded channels are not really the issue here. Even in omicron#9259, the capacity=1 channel was basically behaving as documented and as one would expect. It woke up a sender when capacity was available, and the other senders were blocked to maintain the documented FIFO property. However, some of the patterns that we use with bounded channels are problematic on their own and, if changed, could prevent the channel from getting caught up in a futurelock.
In Omicron, we commonly use bounded channels with send(msg).await. The bound is intended to cap memory usage and provide backpressure, but using the blocking send creates a second unbounded queue: the wait queue for the channel. Instead, we could consider using a larger capacity channel plus try_send() and propagate failure from try_send().
As an example, when we use the actor pattern, we typically observe that there’s only one actor and potentially many clients, so there’s not much point in buffering messages in the channel. So we use capacity = 1 and let clients block in send().await. But we could instead have capacity = 16 and have clients use try_send() and propagate failure if they’re unable to send the message. The value 16 here is pretty arbitrary. You want it to be large enough to account for an expected amount of client concurrency, but not larger. If the value is too small, you’ll wind up with spurious failures when the client could have just waited a bit longer. If the value is too large, you can wind up queueing so much work that the actor is always behind (and clients are potentially even timing out at a higher level). One might observe:
Channel limits, channel limits: always wrong!
Some too short and some too long!
But as with timeouts, it’s often possible to find values that work in practice.
Using send_timeout() is not a mitigation because this still results in the sender blocking. It needs to be polled after the timeout expires in order to give up. But with futurelock, it will never be polled.