Over the last ten weeks, we quietly swapped every SATA SSD in our shared-hosting fleet for enterprise NVMe drives. Forty-two racks, eleven hundred and twelve physical servers, zero customer-visible downtime. Here's what we learned — and the actual numbers from before and after.
Why now?
SATA SSDs served us well for seven years. They're reliable, cheap, and more than fast enough for static content. But shared hosting workloads aren't static content — they're small, random database reads from a hundred concurrent WordPress installs, all fighting for the same IOPS budget.
Two things pushed us to act: PHP 8.3 put more pressure on opcache invalidation, and the ratio of WooCommerce installs on our shared tier crossed 40% last quarter. Both mean more small, random reads. SATA tops out around 90k random IOPS; our NVMe drives are rated for 650k.
We're not making this change because NVMe is trendy. We're making it because WooCommerce checkouts were 300 ms slower than they should be.
The numbers
Here's the part everyone wants: what actually got faster, by how much, across the entire fleet.
| Metric | SATA SSD | NVMe | Change |
|---|---|---|---|
| Median TTFB (WP + WooCommerce, cold cache) | 412 ms | 141 ms | −66% |
| MySQL p95 query latency (100 conn) | 88 ms | 17 ms | −81% |
| MySQL p99 query latency (100 conn) | 210 ms | 34 ms | −84% |
The TTFB drop is the headline, but the MySQL p95 is where things get interesting. On SATA, we'd routinely see spikes past 200 ms when the buffer pool thrashed. On NVMe those spikes are gone — 95th percentile is flat at 17 ms.
Not just speed — predictability.
The single most important change isn't the raw IOPS number, it's consistency under load. SATA SSDs under heavy random read load have long-tail latency spikes that are death for shared hosting. NVMe's queue depth (up to 64k vs SATA's 32) means everything else in the server stops starving.
Migration without downtime
The trick was a rolling live-migration using drbd + pacemaker. Every customer's data was synced to a new host, the DNS TTL was dropped to 60 seconds 24 hours in advance, and the cutover happened with a single corosync-cfgtool command per server. Average customer-visible downtime per migrated account: 380 milliseconds. Max observed: 2.1 seconds.
What we didn't change
Pricing. This migration doubled our storage cost per rack and we're absorbing it. Keeping pricing flat on renewal is the core of what we promise — and we'd rather cut our own margin than break that promise.
What's next
VPS is already on NVMe, and has been for three years. Next up is the move from PCIe Gen3 to Gen4 NVMe drives in our dedicated server line — that work kicks off in Q3. Expect a writeup when it's done.
— Mikko