This page is a structural overview of how Pebble plugs into etcd alongside bbolt. It is the reference companion to the posts.
1. The Backend interface
The integration point is the existing backend.Backend interface in server/storage/backend/. Everything above it — MVCC, watch, lease, auth, apply, gRPC — talks to a Backend, not a backend implementation. The --backend flag, together with an engine-marker file inside the data directory, decides which implementation backend.New returns.
The Backend interface is the only place engine knowledge lives. Everything above it is engine-neutral.
There is no architectural ambition above the interface. MVCC, the apply loop, watch, lease, and auth all stay byte-for-byte the same. Only test code and the factory itself need to know that more than one engine exists.
2. The snapshot wire format
When a follower falls behind, the leader sends its database image over the wire. Under bbolt that was a single file, streamed directly. Under Pebble it is a tar of a checkpoint directory. Mixed clusters during a rolling upgrade must be able to distinguish the two formats unambiguously.
A 17-byte header plus a SHA-256 trailer turn the snapshot stream into something a receiver can route. The legacy bbolt format is preserved by absence.
The WireSize() of a Pebble snapshot is header + body + trailer — not just body. Getting that right is the regression test that closed Bug B from post 5
.
3. The WAL-disabled durability protocol
This is the trickiest decision in the entire milestone. Pebble’s internal WAL is disabled because etcd’s Raft WAL is already a durable log. Recovery hinges on a single integer: the last Raft index that has been durably flushed into a Pebble SSTable.
Capture the index before the flush; persist it after the flush returns; never store it inside Pebble itself. The order matters.
The gate that closes Phase 4 — described in post 4
and verified in post 5
— requires this protocol to survive 1,000 random kill -9s with zero state divergences across the cluster. The implementation is roughly forty lines of Go; the ordering around those lines is what the chaos run validates.
Where to read more
- Post 1 — The case for a second storage engine — motivation and non-goals.
- Post 2 — Sealing the bbolt leaks — the four leaks in the interface and how Phase 1 closed them.
- Post 3 — How bbolt and Pebble differ — B+ tree vs LSM-tree at a conceptual level.
- Post 4 — Disabling Pebble’s WAL — the durability protocol above.
- Post 5 — What the chaos gate surfaced — bugs the kill -9 gate found.
- Post 6 — What we’re explicitly not doing — the scope discipline behind all of the above.