This page is a structural overview of how Pebble plugs into etcd alongside bbolt. It is the reference companion to the posts.

1. The Backend interface

The integration point is the existing backend.Backend interface in server/storage/backend/. Everything above it — MVCC, watch, lease, auth, apply, gRPC — talks to a Backend, not a backend implementation. The --backend flag, together with an engine-marker file inside the data directory, decides which implementation backend.New returns.

Above the seam — no engine knowledgegRPC / Maintenancesnapshot send/receiveMVCC storerevisions, watchesLessor / Authleases, RBACApply loopraft → backend writesSchema / etcdutloffline toolingbackend.Backend interfaceReadTx() · BatchTx() · ConcurrentReadTx() · Snapshot() · Hash() · Size() · Defrag() · Close()26 lines · engine-neutral · sealed of bbolt leaks in Phase 1backend.New() dispatchEngine = bbolt (default)Engine = pebble (opt-in via --backend=pebble)*backend (bbolt impl)single-file B+ tree · mmap readsunchanged in this milestone${data-dir}/member/snap/db (file)*pebbleBackend (new)LSM · DisableWAL=true · checkpoint-based snapshotsindexed batches · explicit block cache${data-dir}/member/snap/db (directory)

The Backend interface is the only place engine knowledge lives. Everything above it is engine-neutral.

There is no architectural ambition above the interface. MVCC, the apply loop, watch, lease, and auth all stay byte-for-byte the same. Only test code and the factory itself need to know that more than one engine exists.

2. The snapshot wire format

When a follower falls behind, the leader sends its database image over the wire. Under bbolt that was a single file, streamed directly. Under Pebble it is a tar of a checkpoint directory. Mixed clusters during a rolling upgrade must be able to distinguish the two formats unambiguously.

Pebble engine_id = 0x01Magic16 bytes"ETCDSNAP\x00…"engine1 byte0x01format ver4 bytesreservedBodyN bytestarred Pebble checkpoint (hardlinked SSTables)SHA-25632 bytestrailerLegacy bbolt streambare bbolt file bytes — no headerReceiver sniffs the first 16 bytes; absence of magic ⇒ treat as engine 0x00 (legacy bbolt).

A 17-byte header plus a SHA-256 trailer turn the snapshot stream into something a receiver can route. The legacy bbolt format is preserved by absence.

The WireSize() of a Pebble snapshot is header + body + trailer — not just body. Getting that right is the regression test that closed Bug B from post 5 .

3. The WAL-disabled durability protocol

This is the trickiest decision in the entire milestone. Pebble’s internal WAL is disabled because etcd’s Raft WAL is already a durable log. Recovery hinges on a single integer: the last Raft index that has been durably flushed into a Pebble SSTable.

Normal operation1. Apply loop writesbatch.Commit(NoSync)entries land in memtable2. FlushBegin firesPebble event listenercapture appliedIndex = N3. Flush completesFlushEnd firesSSTables fsync'd in L04. Persistwrite N to file+ parent fsync${data-dir}/member/snap/last-flushed-indexan integer · separately fsync'd · parent dir fsync'dnever stored inside Pebble itselfCrash recoveryA. Read last-flushedindex file=> NB. Open Pebble, check invariantSSTables prove ≥ N appliedif not → panic (don't start)C. Replay Raft WAL from N+1re-apply entries above last-flushed indexconverges to clean-shutdown state

Capture the index before the flush; persist it after the flush returns; never store it inside Pebble itself. The order matters.

The gate that closes Phase 4 — described in post 4 and verified in post 5 — requires this protocol to survive 1,000 random kill -9s with zero state divergences across the cluster. The implementation is roughly forty lines of Go; the ordering around those lines is what the chaos run validates.

Where to read more