Phase 4’s re-run is mid-flight as I type this; the next gate-result post is queued behind it. Meanwhile, this is the post about the other half of the work — the proposals we said no to, and the reasons we said no.

These are not soft “out of scope” items. They are the cases where someone proposed something reasonable and the answer was: not in this milestone, and here is why.

We are not making Pebble the default

The strongest reason is risk asymmetry. The cost of a regression in a default-on backend is borne by every etcd deployment on earth. The cost of a regression in an opt-in backend is borne by the operator who flipped the flag. Those are not comparable failure surfaces.

The second reason is honesty: we don’t know yet. The benchmark harness from Phase 7 will tell us how Pebble performs across the workloads operators actually run. Until those numbers are public, on real hardware, with real Kubernetes-shaped payloads, we have no credible basis for telling someone “switch and be happier.” A default is a commitment to every operator who never reads release notes, and we are not ready to make that commitment.

The third is community. etcd’s CHANGELOG has earned a particular kind of trust over a long time, and the way you keep that trust is by not pulling the rug. bbolt is the default. It will stay the default for at least this milestone. The Pebble path is for operators who have a specific reason to take it.

We are not removing bbolt

Adjacent to the above and worth saying separately. bbolt is not deprecated. It is not on a sunset timeline. The two backends are peers. If the Pebble experiment turns out to be a bad fit for some class of deployments — small clusters, particular filesystems, particular memory profiles — bbolt is the answer for those, indefinitely.

This is the part that buys us the freedom to experiment. If Pebble were going to replace bbolt, the bar for shipping it would be “better in every dimension, on every workload, on every platform.” That bar is unreasonable. The bar we are actually shipping against is “better in the dimensions some operators care about, opt-in, with a clean rollback path.” Much more achievable.

We are not building bidirectional migration

The migration tool moves data one direction: bbolt → Pebble. There is no --migrate-database-to-bbolt.

The reason is that the migration tool’s chaos test matrix is the actual product. The Go is small — bucket-by-bucket copy, integrity check, atomic rename. The hard work is proving that any sequence of kill -9s during any checkpoint leaves the operator with either a usable bbolt directory or a usable Pebble directory, never both broken. That matrix is large. Doubling it for the rare case — an operator who migrated to Pebble and then regretted it — is not a good use of test budget.

Operators get rollback through a different mechanism: the migration tool optionally moves the bbolt files to a backup directory instead of deleting them. If a Pebble cluster needs to roll back, the runbook is “stop, restore the bbolt directory from backup, restart.” That is more steps than a --migrate-to-bbolt flag would have been. It is also a path we can confidently certify.

We are not building dual-write or shadow-read modes

A common pattern in storage migrations is to run both backends in parallel: writes go to both; reads come from one but are compared against the other; once they agree for long enough, you cut over. It is intellectually attractive and operationally expensive — double the disk, double the CPU, half the throughput, and a comparison harness that must itself be correct.

Our position is that the existing test suite, parameterised across both engines, plus the cross-backend canonical hash, plus operator-driven canary deploys, gives us most of the assurance without any of the runtime cost. Canary first on a non-production cluster, then on a small production cluster, then on the rest of the fleet. The chaos suite from Phase 4 stays in CI as a regression guard.

We are not exposing arbitrary Pebble knobs

There will be no --pebble-options=key1=val1,key2=val2. The tuning surface is exactly five flags:

  • --backend-pebble-cache-size
  • --backend-pebble-memtable-size
  • --backend-pebble-compression
  • --backend-pebble-max-concurrent-compactions
  • --backend-pebble-disable-wal (default true)

The reason is the support surface. Every Pebble option exposed externally is one we may have to debug a production cluster against. Five is a number we can write runbooks for. Forty is not.

Internal users with strong opinions can fork the build and override the options directly — that is a deliberate path. The default operator experience is curated.

We are not changing the wire protocol

This sounds obvious and is worth saying anyway. The gRPC API does not change. etcdctl does not change. Client libraries do not change. A Kubernetes apiserver pointed at a Pebble-backed etcd member should not be able to tell the difference at the wire level.

The exception that proves the rule is the snapshot wire format — the bytes that flow when one member sends its database image to another. Those do change, because they have to carry an engine tag for cross-version compatibility. The change is backward compatible: a legacy bare-bbolt stream is sniffed by the absence of a magic header and handled as before. Different-engine snapshot transfers between members are rejected with a clear error message — mixed-engine clusters are explicitly unsupported during normal operation, even if they are tolerable during a rolling upgrade window.

What we are doing

Reading the list back, the through-line is consistent: do the smallest version of this change that delivers operator value, and decline the optional bigger versions. The milestone scope was set by what we chose to leave out, and most of the items above were proposed in good faith by someone who could have built them.