Critical Bug in Linux Congestion Control Could Stall QUIC Connections, Cloudflare Finds

By — min read

Major Bug Discovered in Linux CUBIC Algorithm Affects QUIC Performance

Cloudflare engineers have uncovered a critical bug in the default Linux congestion controller, CUBIC, that can permanently stall data transfers for both TCP and QUIC connections. The bug causes the congestion window (cwnd) to become stuck at its minimum value, preventing recovery after a congestion collapse event. This affects a vast share of internet traffic, as CUBIC is the standard congestion control algorithm in Linux and is widely used in Cloudflare's own QUIC implementation, quiche.

Critical Bug in Linux Congestion Control Could Stall QUIC Connections, Cloudflare Finds
Source: blog.cloudflare.com

Widespread Impact on Internet Traffic

“CUBIC, defined in RFC 9438, governs how most TCP and QUIC connections probe for bandwidth, detect loss, and recover,” said a Cloudflare network engineer. “Our quiche library depends on it, so this bug is in the critical path for a significant portion of our traffic.” The bug was discovered during integration testing of Cloudflare’s ingress proxy, where tests failed 61% of the time under heavy initial loss conditions.

Background: The CUBIC Algorithm and the App-Limited Fix

CUBIC is a loss-based congestion control algorithm that adjusts the sending rate based on packet loss. It increases cwnd when no loss occurs and decreases it upon detecting loss, aiming to maximize bandwidth utilization. A recent Linux kernel change intended to align CUBIC with RFC 9438 §4.2-12, which specifies an app-limited exclusion. This fix, meant for TCP, inadvertently introduced the bug when ported to Cloudflare's QUIC stack.

“The change was designed to solve a real TCP problem, but when we applied it to quiche, it exposed unexpected behavior,” explained the engineer. “We saw cwnd get pinned at its minimum and never recover, even after loss subsided.”

The Symptom: Connections Stuck at Minimum cwnd

The bug manifests in scenarios where heavy packet loss occurs early in a connection. Normally, a congestion controller should recover by gradually increasing cwnd after loss stops. However, under this bug, the cwnd remained locked at the minimum value indefinitely. “It’s a corner case that most tests don’t explore,” said a Cloudflare research scientist. “We focus on steady-state, but recovery from minimum cwnd is exactly what a congestion controller must handle.”

The team found that the bug originated from an interaction between CUBIC's algorithm and QUIC's acknowledgment handling. The kernel fix introduced a condition that, in QUIC, never cleared a certain flag, preventing cwnd growth.

Critical Bug in Linux Congestion Control Could Stall QUIC Connections, Cloudflare Finds
Source: blog.cloudflare.com

The Fix: A One-Line Change

After an intensive investigation, the Cloudflare team implemented an elegant near-one-line fix. “We realized that a single line of code was causing the cycle,” the engineer noted. “By ensuring the flag is properly cleared, cwnd can resume normal growth after recovery.” The fix has been deployed in Cloudflare’s production systems and is being upstreamed to the Linux kernel.

What This Means for Internet Users

For most users, this bug likely caused sporadic slowdowns or timeouts on connections that experienced early loss—common in mobile or congested networks. “In the worst case, a download could stall completely,” the scientist added. “The fix restores expected behavior and improves reliability for millions of QUIC-based services.”

Cloudflare urges other QUIC implementers to review their CUBIC code for similar issues. The company has published details in their engineering blog, linking to the background and symptom sections of this article.

Next Steps and Broader Implications

The bug highlights the challenges of porting kernel algorithms to user-space QUIC stacks. “This was a classic case of an optimization assuming a TCP context,” said an independent networking expert. “QUIC’s different delivery semantics require careful adaptation.” The fix reaffirms CUBIC’s robustness once adapted correctly, and Cloudflare expects similar corrections in other implementations.

Cloudflare will present their findings at upcoming networking conferences. Meanwhile, they recommend all operators using CUBIC in QUIC to apply the patch immediately.

Tags:

Recommended

Discover More

Modernizing Launchpad: A Developer's Guide to the Ubuntu 26.04 LTS Series Page RedesignThe Denza Z: BYD's 1,000+ HP Electric Hypercar – A Comprehensive Technical Guide for European EnthusiastsNavigating FDA Regulation of Compounded Weight Loss Drugs: A Practical GuideGPU Buying Shifts from Hardware Specs to Software Tiers as DLSS Reshapes Market8 Revelations About JWST's Little Red Dots and Their Black Hole Star Identity