The Fundamental Problem

In a Galera cluster, all nodes must apply the same writesets in the same order to maintain consistency. But not all nodes are equal: some are faster (recent hardware, light load), others slower (older hardware, heavy queries).

What happens when a slow node cannot keep up with the write pace? Writesets accumulate in its receive queue (recv queue). If nothing is done, this queue grows indefinitely, consuming all memory, and the node eventually crashes or diverges from the cluster.

Flow Control is the mechanism that prevents this situation. It is the cluster's handbrake: when a node is overwhelmed, it asks the others to slow down.

How Flow Control Works

Flow Control relies on a simple threshold: gcs.fc_limit.

Each Galera node maintains a receive queue (recv queue) storing writesets waiting to be applied. When this queue size exceeds gcs.fc_limit, the node sends an FC_PAUSE message to all other cluster nodes.

Upon receiving FC_PAUSE, other nodes stop sending new writesets to the slow node. Writes across the entire cluster are blocked — that is the price of synchronous consistency.

When the slow node's recv queue drops below gcs.fc_limit * gcs.fc_factor, the node sends an FC_CONTINUE message and the cluster resumes normal operation.

The 5 Critical wsrep Variables

To monitor Flow Control, five status variables are essential:

SHOW GLOBAL STATUS WHERE Variable_name IN (
    'wsrep_local_recv_queue',
    'wsrep_local_recv_queue_avg',
    'wsrep_flow_control_paused',
    'wsrep_flow_control_paused_ns',
    'wsrep_flow_control_sent'
);

wsrep_local_recv_queue

Current recv queue size. In normal operation, this should be close to 0. If it regularly rises above gcs.fc_limit, the node is struggling.

wsrep_local_recv_queue_avg

Moving average of the recv queue. The most reliable indicator for detecting trends. An average above 0.5 deserves investigation.

wsrep_flow_control_paused

Fraction of time spent in Flow Control (between 0 and 1). If this exceeds 0.1 (10% of time), the cluster has a serious performance problem.

wsrep_flow_control_paused_ns

Total time in Flow Control in nanoseconds. Useful for calculating absolute pause time over a period.

wsrep_flow_control_sent

Number of FC_PAUSE messages sent by this node. If a single node sends the majority of FC_PAUSE messages, it is the bottleneck to address.

The 6 Tuning Parameters

gcs.fc_limit

The Flow Control trigger threshold. Default: 16. Increasing this value tolerates more lag before triggering the brake, but increases memory consumption.

gcs.fc_factor

The resume coefficient. Default: 0.5. When the recv queue drops to fc_limit * fc_factor, Flow Control is released. With fc_limit=100 and fc_factor=0.8, FC releases at 80 writesets.

wsrep_slave_threads

Number of writeset application threads. More threads = faster writeset application = recv queue drains faster. Recommendation: 2 x CPU core count.

wsrep_cert_deps_distance

Average certification distance between transactions. Indicates potential parallelism. If high, increasing wsrep_slave_threads will have a positive impact.

gcs.recv_q_hard_limit

Absolute recv queue limit in bytes. If exceeded, the node is aborted. Last resort to avoid OOM. Recommendation: half of RAM + swap.

gcs.max_throttle

Minimum guaranteed throughput even during Flow Control (between 0 and 1). Default 0.25 means even during FC, 25% of normal throughput is maintained. Set to 0 for complete stop during FC.

Expert Recommendations

After years of managing Galera clusters in production, here are consolidated recommendations:

Slave Thread Sizing

wsrep_slave_threads = 2 * CPU_CORES

If your server has 8 cores, start with wsrep_slave_threads = 16. Monitor wsrep_cert_deps_distance — if it is lower than the slave thread count, reduce.

fc_limit Based on Slave Threads

gcs.fc_limit = 5 * wsrep_slave_threads

With 16 slave threads, gcs.fc_limit = 80. This gives threads enough room to work in parallel without triggering FC too early.

fc_factor for Progressive Resume

gcs.fc_factor = 0.8

An fc_factor of 0.8 (instead of the default 0.5) allows more progressive traffic resumption, avoiding FC_PAUSE / FC_CONTINUE oscillations.

Hard Limit for Safety

gcs.recv_q_hard_limit = HALF_RAM_PLUS_SWAP

On a server with 32 GB RAM and 16 GB swap, set gcs.recv_q_hard_limit = 24G. This is the safety net against OOM.

Identifying the Problematic Node

When Flow Control triggers frequently, you need to identify the slow node:

-- On each node
SELECT @@hostname,
       VARIABLE_VALUE AS fc_sent
FROM information_schema.GLOBAL_STATUS
WHERE VARIABLE_NAME = 'wsrep_flow_control_sent';

The node sending the most FC_PAUSE is the bottleneck. Common causes:

Inferior hardware: slower disks, less RAM
Heavy queries: an ALTER TABLE or massive SELECT monopolizing resources
Backup in progress: mariabackup/xtrabackup consuming heavy I/O
Unbalanced application load: too many reads on a node that must also apply writesets

Conclusion

Flow Control is the guardian of consistency in Galera. Understanding how it works and its parameters is essential for maintaining a performant cluster.

Golden rules: slave_threads = 2 * CPU, fc_limit = 5 * threads, fc_factor = 0.8, hard limit = half RAM + swap. Monitor wsrep_flow_control_paused and react as soon as it exceeds 10%.

This article was originally published on Medium.

Galera: Understanding Flow Control