PmaControl logo PmaControl
  • Home
  • PmaControl
    • AI Agents 13 on-premise agents
    • Plans Community, Cloud, On-Premise, Premium
    • Documentation Guides, API, architecture
    • Clients 28+ companies
    • FAQ 25 questions / 7 categories
    Databases
    • MariaDB 30 articles
    • MySQL 10 articles
    • Galera Cluster 6 articles
    • MaxScale 3 articles
    • ProxySQL 2 articles
    • Amazon Aurora MySQL 0 article
    • Azure Database 0 article
    • ClickHouse 0 article
    • GCP CloudSQL 0 article
    • Percona Server 0 article
    • SingleStore 0 article
    • TiDB 0 article
    • Vitess 0 article
    Solutions
    • Support 24×7 MariaDB & MySQL emergencies
    • Observabilité SQL Monitoring, alerts, topology
    • Haute disponibilité Replication, failover, Galera
    • Disaster Recovery Backup, restore, RPO/RTO
    • Sécurité & conformité Audit, GDPR, SOC2
    • Migration & upgrade Zero downtime, pt-osc, gh-ost
  • Plans
  • Resources
    • Documentation Technical guides & API
    • FAQ 25 frequently asked questions
    • Testimonials Client feedback & use cases
    • Blog Articles & insights
    • Roadmap Upcoming features
    Areas of expertise
    • Observabilité SQL Monitoring, alerts, Dot3 topology
    • Haute disponibilité Replication, failover, Galera
    • Sécurité & conformité Audit, GDPR, SOC2, ISO 27001
    • Disaster Recovery Backup, restore, RPO/RTO
    • Performance & optimisation Digests, EXPLAIN, tuning
    • Migration & upgrade Zero downtime, pt-osc
    Quick links
    • GitHub Wiki 26 pages — install, engine, plugins
    • Source code Official GitHub repository
    • Support 24×7 MariaDB & MySQL emergencies
    • Book a demo 30 min — real architecture
  • Support 24×7
  • Book a demo
Book a demo
🇫🇷 FR Français 🇬🇧 EN English 🇵🇱 PL Polski 🇷🇺 RU Русский 🇨🇳 ZH 中文
← Back to blog

Control the uncontrollable: anatomy of a DC crash

Published on March 15, 2026 By Aurélien LEQUOY
incident disaster-recovery galera
Share X LinkedIn Facebook Email PDF
Control the uncontrollable: anatomy of a DC crash

Context

A Friday evening, 10:47 PM. The Telegram alert fires: the primary datacenter is down. Not a server, not a rack — the entire DC. Cascading power failure, the UPS systems didn't hold.

28 MariaDB / MySQL instances, 3 Galera clusters, 2 ProxySQL. Everything is offline.

Timeline

Time Action
22:47 PmaControl alert — DC unreachable
22:49 OVH confirmation — electrical incident in DC
22:51 DNS failover to secondary DC
22:54 Galera bootstrap on the surviving node
22:58 ProxySQL automatic reconfiguration
23:01 First successful SELECT on the secondary cluster

Lessons learned

  1. Backups are not enough — without a tested recovery plan, they are useless
  2. Galera IST vs SST — the difference between 2 minutes and 2 hours of recovery
  3. PmaControl detected the incident in 12 seconds — before even the OVH alert

Conclusion

14 minutes between the alert and the first SELECT. This is the result of preparation, not luck.

"A backup does not replace a recovery strategy." — PmaControl

Share X LinkedIn Facebook Email PDF
← Back to blog

Comments (0)

No comments yet.

Leave a comment

PmaControl
+33 6 63 28 27 47 contact@pmacontrol.com
Legal notice GitHub Contact
Do not wait for an incident to understand your architecture. © 2014-2026 PmaControl — 68Koncept