The single best decision in etcd’s storage subsystem is one most users will never read about: there is a Backend interface, and most of the codebase — MVCC, watch, lease, auth, apply, gRPC — talks to it instead of to bbolt directly.

type Backend interface {
    ReadTx() ReadTx
    BatchTx() BatchTx
    ConcurrentReadTx() ReadTx

    Snapshot() Snapshot
    Hash(ignores func(bucketName, keyName []byte) bool) (uint32, error)
    Size() int64
    SizeInUse() int64
    OpenReadTxN() int64
    Defrag() error
    ForceCommit()
    Close() error

    SetTxPostLockInsideApplyHook(func())
}

Twenty-six lines. No *bolt.DB, no *bolt.Tx, no *bolt.Bucket. That is what makes a second backend even thinkable.

But interfaces are never as tight as they look. Reading the code with adversarial eyes — what would break if I tried to introduce a second implementation? — turned up four places where the bbolt type system had quietly leaked out of the box.

The four leaks

  1. BackendConfig.BackendFreelistType bolt.FreelistType. A bbolt enum, sitting on the struct that constructs every backend. Anything that takes a BackendConfigembed.Config, etcdserver.ServerConfig — transitively imported go.etcd.io/bbolt.

  2. ReadStorageVersionFromSnapshot(*bbolt.Tx). A schema helper that takes a bbolt transaction directly. Bypasses the abstraction entirely.

  3. --backend-bbolt-freelist-type. A CLI flag with bbolt baked into the name and bbolt-valued semantics. Tolerable for a single-engine world, awkward for a two-engine one.

  4. A test helper at mvcc/testutil/hash.go. Reaches into a bbolt cursor to compute a hash for assertions. Used by eight call sites across the test suite.

Each leak in isolation is small. Together they are a wall: every PR that adds Pebble code would have merge conflicts in the same files, and every reviewer would have to keep them in their head.

So Phase 1 is the unglamorous one: seal the leaks first. No Pebble code lands until they are gone.

What “sealed” looks like

The change is mechanical, but worth describing for the shape:

  • Introduce an engine-neutral backend.FreelistType (a string), and switch every public type from bolt.FreelistType to it. Parse it into a bolt.FreelistType only inside the bbolt implementation file.
  • Move ReadStorageVersionFromSnapshot out of the engine-neutral package. It still exists; it lives in version_bbolt.go now, gated behind the bbolt build path. The signature still takes a bbolt type — that is correct, because it inspects on-disk bbolt format. It just isn’t on the public surface any more.
  • Rename the flag to make the engine-conditional nature visible, and add validation so a user can’t pass --backend-bbolt-freelist-type while running --backend=pebble and have it silently ignored.
  • Split the test helper. hash.go takes a generic backend.Backend. hash_bbolt.go keeps the cursor-level path for the seven call sites that genuinely need it. The eighth becomes engine-neutral.

A small grep becomes the regression test: any caller-facing reference to bbolt. outside the bbolt implementation files fails the lint. (Phase 5 adds forbidigo/depguard rules so this stays true forever.)

Why this matters

Three reasons.

It de-conflicts every later PR. With the leaks sealed, the Pebble implementation is additive — new files, new types, no edits to the bbolt path. The bbolt code path doesn’t change behaviour at all; we run the full unit and integration suite against an unchanged default backend to prove it.

It forces the abstraction to be honest. Every leak is evidence that the interface, as written, was almost good enough but not quite. Closing the leaks is also a small re-design: we say out loud what is engine-neutral and what is engine-specific, and the codebase reflects it.

It makes the second backend tractable. With a 26-line interface and no leaks, the question “can Pebble implement this?” becomes answerable method by method — which is exactly how the next phase is structured.

What this is not

Phase 1 is not the place to refactor the Backend interface itself. Tempting, but wrong. The interface as it stands has carried etcd for years; changing it is a separate decision with its own risk surface. We sealed the leaks under the existing interface and left the bigger redesign for a future milestone — ideally one informed by what we actually learn implementing the second backend.

It is also not a place to be clever. The PRs in this phase are deliberately mechanical. Every bolt.FreelistType becomes a backend.FreelistType. Every test that touches the helper gets parameterized. No new features, no opportunistic cleanups, no “while I’m here” changes. The goal is to ship a phase that an existing operator could deploy and notice nothing.

Next: a primer on the two trees themselves — what bbolt and Pebble actually do under the hood, and why the difference shows up at the operational seams.