RFC: Composite Primary Keys
This document outlines a practical, incremental plan to add composite (multi-column) primary key support to Tonbo while maintaining backward compatibility. It explains design goals, changes required across the codebase, and a step-by-step implementation and validation plan.
Current Status (2025-08-11)
- Phase 1: Completed. Plural
Schema
APIs, fixed projections, and Parquet writer configuration are implemented for single-PK schemas.Schema
now exposesprimary_key_indices()
andprimary_key_paths_and_sorting()
(src/record/mod.rs). Macro-generated single-PK schemas return one-element slices.- Read paths build fixed projections as
[0, 1] ∪ PKs
usingprimary_key_indices()
(src/lib.rs, src/transaction.rs). DbOption::new
configures sorting columns (_ts
then PKs) and enables stats + bloom filters for each PK column path (src/option.rs).
- Phase 2: Not implemented. Composite key types under
src/record/key/composite/
are placeholders; derive macro still accepts only a single#[record(primary_key)]
and generates a single-field key. No multi-PK trybuild/integration tests. - Phase 3: Not implemented.
DynSchema
remains single-PK (stores oneprimary_index_arrow
, onepk_path
, and sorting with a single PK column).
Goals
- Support multi-column primary keys with lexicographic ordering of PK components.
- Preserve existing single-column PK behavior and public APIs (backward compatible).
- Keep zero-copy reads and projection pushdown guarantees for PK columns.
- Ensure on-disk layout (Parquet) remains sorted by
_ts
then PK(s), with statistics/bloom filters enabled for PK columns. - Make it easy to use via the
#[derive(Record)]
macro by allowing multiple#[record(primary_key)]
fields.
Non-Goals (for this RFC)
- Foreign keys, cascades, or relational constraints.
- Secondary indexes.
- Schema migrations for existing data files.
- Composite keys in dynamic records in the first phase (can be added subsequently).
High-Level Design
- Schema trait changes (completed)
- Now:
Schema
exposesprimary_key_indices()
andprimary_key_path()
. primary_key_index()
was removed in favor of the slice-basedprimary_key_indices()
.- Additive helper:
primary_key_paths_and_sorting()
returns all PK column paths plus sorting columns. - For single-column PKs, implementations return a one-element slice from
primary_key_indices()
.
- Composite key type(s)
- Introduce a composite key in
src/record/key/composite/
with lexicographicOrd
:- Option A (preferred): The macro generates a record-specific key struct, e.g.,
UserKey { k1: u64, k2: String }
andUserKeyRef<'r> { ... }
. - Option B (interim): Provide generic tuple implementations for
(K1, K2)
,(K1, K2, K3)
, … up to a small N. Each implementsKey
andKeyRef
with lexicographicOrd
, plusEncode
/Decode
,Hash
,Clone
.
- Option A (preferred): The macro generates a record-specific key struct, e.g.,
- For string/bytes components,
KeyRef
holds borrowed forms, mirroring current single-PK behavior.
- Macro updates (tonbo_macros)
- Allow multiple
#[record(primary_key)]
fields. Order of appearance in struct determines comparison order (later we can addorder = i
if needed). - Generate:
- Record-specific key struct and ref struct (Option A), or map to tuple (Option B).
type Key = <GeneratedKey>
inSchema
impl.fn key(&self) -> <GeneratedKeyRef>
inRecord
impl.fn primary_key_indices(&self) -> Vec<usize>
inSchema
impl (indices are offset by 2 for_null
,_ts
).
- Ensure
RecordRef::from_record_batch
and projection logic always keep all PK columns, even if they are not listed in the projection. - Keep encoding/arrays builders unchanged in signature; they already append values per-field.
- Projections and read paths
- Replace single-index assumptions with multi-index collections:
- Use
[0, 1] ∪ primary_key_indices()
to build fixed projections insrc/lib.rs
andsrc/transaction.rs
. - In all
RecordRef::projection
usages, ensure all PK columns are always retained (already implied by fixed mask).
- Use
- Parquet writer configuration
- In
DbOption::new
, useprimary_key_paths_and_sorting()
to:- Enable stats and bloom filters for each PK column path via
.set_column_statistics_enabled()
and.set_column_bloom_filter_enabled()
(invoke once per path). - Set sorting columns as
[ SortingColumn(_ts, …), SortingColumn(pk1, …), SortingColumn(pk2, …), … ]
.
- Enable stats and bloom filters for each PK column path via
- Dynamic records (phase 2)
- Extend
DynSchema
to trackprimary_indices: Vec<usize>
in metadata (replacing the singleprimary_key_index
). - Update
DynRecordRef::new
and readers to honor multiple PK indices. - Ensure tombstone writes (row == None) still populate all PK columns from
Ts.key
so ordering/lookups remain correct. - Define a composite key wrapper for
Value
/ValueRef
(or generate a per-dyn-schema composite type if feasible). Initially out-of-scope for phase 1.
Step-by-Step Plan
Phase 1: Core plumbing (single-PK stays working)
-
Extend
Schema
trait- Add
primary_key_indices()
andprimary_key_paths_and_sorting()
with default impls wrapping existing methods. - Update call sites in
DbOption::new
,src/lib.rs
, andsrc/transaction.rs
to use the plural forms. - Acceptance: All tests pass; no behavior change for single-PK users.
- Add
-
Fixed projection refactor
- Replace single
primary_key_index
usage with iteration overprimary_key_indices()
to constructfixed_projection
=[0, 1] ∪ PKs
. - Acceptance: Existing tests and scan/get projections still behave identically for single-PK.
- Replace single
-
Parquet writer properties
- Replace single
primary_key_path()
usage with plural variant to configure stats, bloom filters, and sorting columns for_ts
plus all PK components. - Acceptance: Files write successfully; read paths unchanged.
- Replace single
Phase 2: Macro + key types
-
Composite key data structure
- Implement composite key(s) in
src/record/key/composite/
withEncode
/Decode
,Ord
,Hash
,Key
/KeyRef
. - Start with tuples
(K1, K2)
,(K1, K2, K3)
etc. (Option B) for faster delivery; later switch default macro to per-record key type (Option A). - Acceptance: Unit tests confirm lexicographic ordering and encode/decode round-trip for composite keys.
- Implement composite key(s) in
-
Update
#[derive(Record)]
- Allow multiple
#[record(primary_key)]
fields and generate:type Key = (<K1>, <K2>, …)
(Option B) or<RecordName>Key
(Option A).fn key(&self) -> (<K1Ref>, <K2Ref>, …)
.fn primary_key_indices(&self) -> Vec<usize>
with +2 offset.- Ensure
from_record_batch
and projection retain all PK columns.
- Acceptance: trybuild tests covering multi-PK compile and run; single-PK tests unchanged.
- Allow multiple
-
Integration tests
- Add end-to-end tests: insert/get/remove, range scans, projection, and ordering on 2+ PK fields (e.g.,
tenant_id: u64, name: String
). - Acceptance: All new tests pass.
- Add end-to-end tests: insert/get/remove, range scans, projection, and ordering on 2+ PK fields (e.g.,
Phase 3: Dynamic records (optional)
DynSchema
multi-PK- Store
primary_indices
metadata; update dynamic arrays/refs to keep all PK columns in projections. - Provide a composite
ValueRef
key wrapper for in-memory operations. - Ensure tombstones populate PK components from
Ts.key
in builders (e.g.,DynRecordBuilder::push
). - Acceptance: dynamic tests mirroring integration scenarios pass, including tombstone rows retaining all PK components.
- Store
Code Touchpoints
- Traits/APIs:
src/record/mod.rs
(Schema),src/option.rs
(DbOption::new) - Read paths:
src/lib.rs
(get/scan/package),src/transaction.rs
(get/scan) - Macro codegen:
tonbo_macros/src/record.rs
,tonbo_macros/src/keys.rs
,tonbo_macros/src/data_type.rs
- Key types:
src/record/key/composite/
- Dynamic (phase 3):
src/record/dynamic/*
(incl. tombstone handling inDynRecordBuilder::push
)
Testing Strategy
-
Unit tests:
- Composite key
Ord
,Eq
,Hash
,Encode
/Decode
round-trip. Schema
default impl compatibility.
- Composite key
-
trybuild tests:
- Multiple
#[record(primary_key)]
in a struct compiles and generates expected APIs. - Reject nullable PK components.
- Multiple
-
Integration tests:
- Insert/get/remove by composite key; range scans across composite key ranges; projection keeps PK columns.
- WAL/compaction unaffected (basic smoke tests).
-
(Optional) Property tests: ordering equivalence vs. native tuple lexicographic ordering when Option B is used.
-
Tombstones:
- For both single- and multi-column PKs, verify that tombstone rows (row == None) keep all PK column values populated from
Ts.key
in Arrow arrays and throughRecordRef::from_record_batch
.
- For both single- and multi-column PKs, verify that tombstone rows (row == None) keep all PK column values populated from
Backward Compatibility & Migration
- All existing single-PK code continues to work without changes due to default-impl fallbacks.
- Users opting into composite PKs need only annotate multiple fields with
#[record(primary_key)]
. - No on-disk migration is required for existing tables; new tables with composite PKs will write Parquet sorting columns for all PK components.
Risks and Mitigations
- API surface increase: keep new APIs additive with conservative defaults.
- Projection bugs: comprehensive tests to ensure PK columns are always included.
- Performance: lexicographic compare is standard; Arrow array lengths are uniform, so no extra bounds checks needed.
- Dynamic records complexity: staged to a later phase to avoid blocking initial delivery.
Example (target macro UX)
#[derive(Record, Debug)]
pub struct User {
#[record(primary_key)]
pub tenant_id: u64,
#[record(primary_key)]
pub name: String,
pub email: Option<String>,
pub age: u8,
}
// Generated (conceptually):
// type Key = (u64, String);
// fn key(&self) -> (u64, &str);
// fn primary_key_indices(&self) -> Vec<usize> { vec![2, 3] }
Delivery Checklist
- Add Schema plural APIs and refactor call sites.
- Implement composite key types (tuples first).
- Enable multiple PK fields in macro; generate composite key/ref and PK indices.
- Update projection logic to retain all PK columns.
- Configure Parquet sorting/statistics for all PK components.
- Add unit/trybuild/integration tests.
- Update user guide (mention composite PK support and examples).
- Ensure tombstone rows keep PK component values (builders/readers), validated by tests.