PmaControl logo PmaControl
  • Home
  • PmaControl
    • AI Agents 13 on-premise agents
    • Plans Community, Cloud, On-Premise, Premium
    • Documentation Guides, API, architecture
    • Clients 28+ companies
    • FAQ 25 questions / 7 categories
    Databases
    • MariaDB 30 articles
    • MySQL 10 articles
    • Galera Cluster 6 articles
    • MaxScale 3 articles
    • ProxySQL 2 articles
    • Amazon Aurora MySQL 0 article
    • Azure Database 0 article
    • ClickHouse 0 article
    • GCP CloudSQL 0 article
    • Percona Server 0 article
    • SingleStore 0 article
    • TiDB 0 article
    • Vitess 0 article
    Solutions
    • Support 24×7 MariaDB & MySQL emergencies
    • Observabilité SQL Monitoring, alerts, topology
    • Haute disponibilité Replication, failover, Galera
    • Disaster Recovery Backup, restore, RPO/RTO
    • Sécurité & conformité Audit, GDPR, SOC2
    • Migration & upgrade Zero downtime, pt-osc, gh-ost
  • Plans
  • Resources
    • Documentation Technical guides & API
    • FAQ 25 frequently asked questions
    • Testimonials Client feedback & use cases
    • Blog Articles & insights
    • Roadmap Upcoming features
    Areas of expertise
    • Observabilité SQL Monitoring, alerts, Dot3 topology
    • Haute disponibilité Replication, failover, Galera
    • Sécurité & conformité Audit, GDPR, SOC2, ISO 27001
    • Disaster Recovery Backup, restore, RPO/RTO
    • Performance & optimisation Digests, EXPLAIN, tuning
    • Migration & upgrade Zero downtime, pt-osc
    Quick links
    • GitHub Wiki 26 pages — install, engine, plugins
    • Source code Official GitHub repository
    • Support 24×7 MariaDB & MySQL emergencies
    • Book a demo 30 min — real architecture
  • Support 24×7
  • Book a demo
Book a demo
🇫🇷 FR Français 🇬🇧 EN English 🇵🇱 PL Polski 🇷🇺 RU Русский 🇨🇳 ZH 中文
← Back to blog

MyRocks Under Load: When ALTER TABLE Triggers Corruption

Published on March 6, 2026 By Aurélien LEQUOY
mariadb rocksdb corruption ddl incident-response mdev-39044
Share X LinkedIn Facebook Email PDF
MyRocks Under Load: When ALTER TABLE Triggers Corruption

Context

On March 6, 2026, a production MariaDB 10.11.15 server monitored by PmaControl suffered a major incident. Unlike typical crashes (OOM, segfault), this one presented unique symptoms:

RocksDB: Error opening instance, Status Code: 2,
  Status: Corruption: truncated record body
Incorrect information in file: './pmacontrol/ts_value_general_int.frm'
Can't init tc log
Aborting

The server restarted in a loop several times before stabilising, with .frm mismatch errors on multiple time-series tables.

The MDEV-39044 ticket

After investigation, we correlated this incident with MariaDB ticket MDEV-39044:

MyRocks corruption after restart during/after ALTER workload: Corruption: truncated record body, .frm mismatch, no crash log, no OOM killer

What the ticket describes

The ticket documents a reproducible corruption scenario:

  1. Large partitioned RocksDB tables — exactly what PmaControl uses for metrics (ts_value_* tables partitioned by day)
  2. ALTER TABLE under write load — adding partitions while the application writes continuously
  3. Simultaneous InnoDB memory pressure — InnoDB and RocksDB tables coexist on the same server
  4. No kernel trace — no OOM killer, no segfault, no crash log

Why it's insidious

The most dangerous aspect of the ticket: the absence of a crash log is the expected behaviour in this scenario. The server restarts, performs InnoDB crash recovery, but the RocksDB metadata is corrupted (.frm mismatch).

A DBA who only checks journalctl or dmesg will find nothing. They'll classify the incident as an "unexplained restart" and move on.

Our concrete case

Affected tables

All partitioned RocksDB tables with heavy daily writes:

  • ts_value_general_int — integer metrics (status variables, counters)
  • ts_value_general_json — complex JSON metrics
  • ts_mysql_digest_stat — query statistics (digests)
  • ts_value_general_text — text metrics
  • ts_value_slave_int — replication metrics
  • ts_value_slave_text — detailed replication states

The likely trigger

PmaControl automatically maintains partitions on these tables: adding next day's partition, dropping expired partitions. These are ALTER TABLE ... ADD PARTITION / DROP PARTITION operations on tables weighing tens of GB, while collection workers write continuously (every 10 seconds per monitored server).

Memory pressure signals

Before the crash, the MariaDB log shows:

InnoDB: Memory pressure event disregarded

The MDEV-39044 ticket explicitly cites this pattern as an aggravating factor. InnoDB memory pressure doesn't directly cause the corruption, but it creates the context in which the RocksDB DDL becomes non-atomic.

How PmaControl detected the incident

  1. Uptime reset detected within 10 seconds via the ts_variable.uptime time series
  2. Telegram alert sent immediately
  3. Automatic correlation with the error log: detection of crash recovery + truncated record body signatures
  4. Retrospective analysis: metrics from the preceding hour (threads, memory, CPU) were normal — confirming this is not a typical load issue

Recommendations

Immediate actions

  1. Do not run DDL on RocksDB tables under write load. Schedule ALTER TABLE ... ADD/DROP PARTITION during low-activity windows.

  2. Monitor .frm errors in the error log. This is the first indicator of post-DDL corruption.

  3. Follow ticket MDEV-39044 for an official fix.

Structural actions

  1. Separate engines: if possible, do not mix InnoDB and RocksDB on the same server for critical tables.

  2. Consider migrating hot tables to InnoDB. RocksDB excels at sequential writes, but its DDL operations are not atomic under load.

  3. Size memory properly to avoid the InnoDB pressure that aggravates the problem. See our article on the OOM killer for worst-case calculations.

What it is not

  • It is not a hardware problem (disk, RAM)
  • It is not a MySQL configuration problem (parameters are correct)
  • It is not reproducible on demand (it's a race condition in the RocksDB/DDL engine)

It is an engine bug documented by MariaDB themselves.

Conclusion

MDEV-39044 is a reminder that using alternative storage engines (RocksDB, TokuDB) on production workloads requires particular vigilance around DDL. The absence of a crash log does not mean the absence of corruption.

PmaControl detects these incidents through uptime monitoring + error log correlation, where standard tools see nothing.

Share X LinkedIn Facebook Email PDF
← Back to blog

Comments (0)

No comments yet.

Leave a comment

PmaControl
+33 6 63 28 27 47 contact@pmacontrol.com
Legal notice GitHub Contact
Do not wait for an incident to understand your architecture. © 2014-2026 PmaControl — 68Koncept