The case for a second storage engine
Why etcd is getting a Pebble (LSM-tree) backend alongside bbolt — what hurts at Kubernetes scale, and what we explicitly are not changing.
Posts
Field notes from porting etcd's storage engine from bbolt (B+ tree) to Pebble (LSM-tree). Read in order.
Why etcd is getting a Pebble (LSM-tree) backend alongside bbolt — what hurts at Kubernetes scale, and what we explicitly are not changing.
Before any Pebble code can land, four bbolt-typed APIs have to leave the public Backend interface. Boring work, load-bearing outcome.
A short primer on B+ trees and LSM-trees, and why the difference between them shows up at exactly the operational seams etcd operators care about.
etcd already has a Raft write-ahead log. Running Pebble's WAL on top is wasted fsyncs. The trade-off is that we have to be exactly right about a single integer.
We ran the kill -9 gate. It surfaced three bugs in code we hadn't touched. Two were structural; one cascaded from another.
Big migrations succeed by what they refuse to take on. A list of things this milestone deliberately punts — and why each refusal earns its keep.