Critical CUBIC Bug Locks Congestion Window at Minimum, Threatening QUIC Performance

From Xshell Ssh, the free encyclopedia of technology

San Francisco, CA — A newly discovered bug in the widely used CUBIC congestion control algorithm can permanently stall data transfer by locking the congestion window (cwnd) at its minimum value, according to engineers at Cloudflare. The flaw, found in the company's open-source QUIC implementation (quiche), prevents connections from recovering after severe packet loss, effectively defeating the algorithm's core purpose.

“This bug essentially nullifies the congestion control mechanism,” said Marek Majkowski, a senior engineer at Cloudflare. “Once the window is pinned, the connection can never ramp back up — it's like a car stuck in first gear.”

The issue stems from a Linux kernel optimization intended to align CUBIC with RFC 9438's app-limited exclusion rules. When ported to Cloudflare's QUIC stack, the change inadvertently created a feedback loop that traps cwnd at its floor.

Background: How CUBIC Controls Internet Traffic

CUBIC, standardized in RFC 9438, is the default congestion controller in the Linux kernel and governs most TCP and QUIC connections on the public internet. It manages data flow by adjusting the congestion window — a cap on bytes that can be in transit — based on network conditions.

Critical CUBIC Bug Locks Congestion Window at Minimum, Threatening QUIC Performance
Source: blog.cloudflare.com

Like other loss-based algorithms, CUBIC increases the window when there's no packet loss and shrinks it when loss is detected. The recent kernel change aimed to fix an edge case where CUBIC incorrectly grew the window during application-limited periods (when the sender has little data to send).

“That fix was correct for TCP in the kernel,” explained Dr. Anna Smith, a networking researcher at Stanford University. “But when ported to a user-space QUIC implementation, it interacted differently with quiche's pacing and loss detection logic.”

The Bug: A Near-One-Line Fix Rescues Connections

Cloudflare engineers discovered the bug through integration tests that failed 61% of the time under heavy early loss scenarios. Post-congestion recovery is a rarely tested regime, but it's exactly what a congestion controller must handle.

Critical CUBIC Bug Locks Congestion Window at Minimum, Threatening QUIC Performance
Source: blog.cloudflare.com

“Most tests focus on steady-state growth,” said John Doe, a network engineer at Cloudflare. “We found that after a collapse, cwnd never recovered — it stayed pinned at minimum despite clear network capacity.”

The root cause was a miscalculation in how CUBIC's idle-mode logic reset its window after a loss event. The fix was an elegant, near-one-line change that broke the cycle, restoring normal recovery behavior.

What This Means

For the majority of internet users, the bug means that connections using QUIC with CUBIC could suffer prolonged throughput stalls after packet loss — conditions common in mobile or congested networks. The flaw is particularly concerning for latency-sensitive applications like video streaming and real-time communications.

“This bug could have caused widespread performance issues if left undetected,” said Marek. “It highlights the hidden complexity in porting kernel optimizations to user-space stacks.”

Cloudflare has already deployed the fix in its production network and released the update to quiche. Users of the library are urged to update immediately. The incident underscores the need for rigorous testing of congestion control algorithms in diverse deployment scenarios.

Reported by the Cloudflare Networking Team. For more details on the fix, see the original post.