What is Tonbo?

Tonbo is an in-process KV database that can be embedded in data-intensive applications written in Rust, Python, or JavaScript (WebAssembly / Deno). It is designed for analytical processing. Tonbo can efficiently write data in real time in edge environments such as browsers and AWS Lambda, with the data stored in memory, on local disks, or in S3 using Apache Parquet format.

Build with schema

Building data-intensive applications in Rust using Tonbo is convenient. You just need to declare the dependency in your Cargo.toml file and then create the embedded database. Tonbo supports:

#[derive(tonbo::Record)]
pub struct User {
    #[record(primary_key)]
    name: String,
    email: Option<String>,
    age: u8,
}

async fn main() {
    let db = tonbo::DB::new("./db_path/users".into(), TokioExecutor::default())
        .await
        .unwrap();
}

All in Parquet

Tonbo organizes all stored data as Apache Parquet files. At each level, these files can reside in memory, on disk, or in S3. This design lets users process their data without any vendor lock-in, including with Tonbo.

			╔═tonbo═════════════════════════════════════════════════════╗
			║                                                           ║
			║    ┌──────╂─client storage─┐  ┌──────╂─client storage─┐   ║
			║    │ ┏━━━━▼━━━━┓           │  │ ┏━━━━▼━━━━┓           │   ║
			║    │ ┃ parquet ┃           │  │ ┃ parquet ┃           │   ║
			║    │ ┗━━━━┳━━━━┛           │  │ ┗━━━━┳━━━━┛           │   ║
			║    └──────╂────────────────┘  └──────╂────────────────┘   ║
			║           ┣━━━━━━━━━━━━━━━━━━━━━━━━━━┛                    ║
			║    ┌──────╂────────────────────────────────server ssd─┐   ║
			║    │      ┣━━━━━━━━━━━┓                               │   ║
			║    │ ┏━━━━▼━━━━┓ ┏━━━━▼━━━━┓                          │   ║
			║    │ ┃ parquet ┃ ┃ parquet ┃                          │   ║
			║    │ ┗━━━━┳━━━━┛ ┗━━━━┳━━━━┛                          │   ║
			║    └──────╂───────────╂───────────────────────────────┘   ║
			║    ┌──────╂───────────╂────────object storage service─┐   ║
			║    │      ┣━━━━━━━━━━━╋━━━━━━━━━━━┳━━━━━━━━━━━┓       │   ║
			║    │ ┏━━━━▼━━━━┓ ┏━━━━▼━━━━┓ ┏━━━━▼━━━━┓ ┏━━━━▼━━━━┓  │   ║
			║    │ ┃ parquet ┃ ┃ parquet ┃ ┃ parquet ┃ ┃ parquet ┃  │   ║
			║    │ ┗━━━━━━━━━┛ ┗━━━━━━━━━┛ ┗━━━━━━━━━┛ ┗━━━━━━━━━┛  │   ║
			║    └──────────────────────────────────────────────────┘   ║
			║                                                           ║
			╚═══════════════════════════════════════════════════════════╝

Easy to be integrated

Compared to other analytical databases, Tonbo is extremely lightweight—only 1.3MB when compressed. In addition to being embedded directly as a KV database within applications, Tonbo can also serve as an analytical enhancement for existing OLTP databases.

For example, Tonbolite is a SQLite plugin built on Tonbo that provides SQLite with highly compressed, analytical-ready tables using Arrow/Parquet to boost query efficiency. Moreover, it can run alongside SQLite in various environments such as browsers and Linux:

sqlite> .load target/release/libsqlite_tonbo

sqlite> CREATE VIRTUAL TABLE temp.tonbo USING tonbo(
    create_sql = 'create table tonbo(id bigint primary key, name varchar, like int)',
    path = 'db_path/tonbo'
);

sqlite> insert into tonbo (id, name, like) values (0, 'tonbo', 100);

sqlite> select * from tonbo;
0|tonbo|100

We are committed to providing the most convenient and efficient real-time analytical database for edge-first scenarios. In addition to Tonbolite, we will offer the following based on Tonbo:

  1. Time-series data writing and querying for observability and other scenarios.
  2. Real-time index building and search based on BM25 or vectors.

We are passionate about establishing Tonbo as an open-source, community-contributed project and are dedicated to building a community around it to develop features for all use cases.

Getting started

Installation

Prerequisite

To get started with Tonbo, ensure that Rust is installed on your system. If you haven't installed it yet, please follow the installation instructions.

Installation

Tonbo supports various target platforms (native, AWS Lambda, browsers, etc.) and storage backends (memory, local disk, S3, etc.). Built on asynchronous Rust, Tonbo improves database operation efficiency, which means you must configure an async runtime for your target platform.

For native platforms, Tokio is the most popular async runtime in Rust. To use Tonbo with Tokio, ensure the tokio feature is enabled in your Cargo.toml file (enabled by default):

tokio = { version = "1", features = ["full"] }
tonbo = { git = "https://github.com/tonbo-io/tonbo" }

For browser targets using OPFS as the storage backend, disable the tokio feature and enable the wasm feature because Tokio is incompatible with OPFS. Since tokio is enabled by default, you must disable default features. If you plan to use S3 as the backend, also enable the wasm-http feature:

tonbo = { git = "https://github.com/tonbo-io/tonbo", default-features = false, features = [
    "wasm",
    "wasm-http",
] }

Using Tonbo

Defining Schema

Tonbo offers an ORM-like macro that simplifies working with column families. Use the Record macro to define your column family's schema, and Tonbo will automatically generate all necessary code at compile time:

use tonbo::Record;

#[derive(Record, Debug)]
pub struct User {
    #[record(primary_key)]
    name: String,
    email: Option<String>,
    age: u8,
}

Further explanation of this example:

  • Record: This attribute marks the struct as a Tonbo schema definition, meaning it represents the structure of a column family.
  • #[record(primary_key)]: This attribute designates the corresponding field as the primary key. Note that Tonbo currently does not support compound primary keys, so the primary key must be unique.
  • Option: When a field is wrapped in Option, it indicates that the field is nullable.

Tonbo supports the following data types:

  • Number types: i8, i16, i32, i64, u8, u16, u32, u64
  • Boolean type: bool
  • String type: String
  • Bytes type: bytes::Bytes

Creating database

After defining your schema, you can create a DB instance using a customized DbOption.

use std::fs;
use tonbo::Path;
use tonbo::{executor::tokio::TokioExecutor, DbOption, DB};

#[tokio::main]
async fn main() {
    // make sure the path exists
    fs::create_dir_all("./db_path/users").unwrap();

    let options = DbOption::new(
        Path::from_filesystem_path("./db_path/users").unwrap(),
        &UserSchema,
    );
    let db = DB::<User, TokioExecutor>::new(options, TokioExecutor::default(), UserSchema)
        .await
        .unwrap();
}

Tonbo automatically generates the UserSchema struct at compile time, so you don’t need to handle it manually. However, ensure that the specified path exists before creating your DBOption.

When using Tonbo in a WASM environment, use Path::from_opfs_path instead of Path::from_filesystem_path.

Operations on Database

After creating the DB, you can perform operations like insert, remove, and get. However, when you retrieve a record from Tonbo, you'll receive a UserRef instance—not a direct User instance. The UserRef struct, which implements the RecordRef trait, is automatically generated by Tonbo at compile time. It might look something like this:

#[derive(Debug, PartialEq, Eq, Clone, Copy)]
pub struct UserRef<'r> {
    pub name: &'r str,
    pub email: Option<&'r str>,
    pub age: Option<u8>,
}
impl RecordRef for UserRef<'_> {
    // ......
}

Insert

DB::insert takes a Record instance—specifically, an instance of the struct you've defined with #[derive(Record)]:

db.insert(User { /* ... */ }).await.unwrap();

Remove

DB::remove accepts a Key, where the type of the key is defined by the field annotated with #[record(primary_key)]. This method removes the record associated with the provided key:

db.remove("Alice".into()).await.unwrap();

Get

DB::get accepts a Key and processes the corresponding record using a closure that receives a TransactionEntry. Within the closure, you can call TransactionEntry::get to retrieve the record as a RecordRef instance:

let age = db.get(&"Alice".into(),
    |entry| {
        // entry.get() will get a `UserRef`
        let user = entry.get();
        println!("{:#?}", user);
        user.age
    })
    .await
    .unwrap();

Scan

Similar to DB::get, DB::scan accepts a closure that processes a TransactionEntry. However, instead of a single key, DB::scan operates over a range of keys, applying the closure to every record that falls within that range:

let lower = "Alice".into();
let upper = "Bob".into();
let stream = db
    .scan(
        (Bound::Included(&lower), Bound::Excluded(&upper)),
        |entry| {
            let record_ref = entry.get();

            record_ref.age
        },
    )
    .await;
let mut stream = std::pin::pin!(stream);
while let Some(data) = stream.next().await.transpose().unwrap() {
    // ...
}

Using transaction

Tonbo supports transaction. You can also push down filter, limit and projection operators in query.

// create transaction
let txn = db.transaction().await;

let name = "Alice".into();

txn.insert(User { /* ... */ });
let user = txn.get(&name, Projection::All).await.unwrap();

let upper = "Blob".into();
// range scan of user
let mut scan = txn
    .scan((Bound::Included(&name), Bound::Excluded(&upper)))
    .take()
    .await
    .unwrap();

while let Some(entry) = scan.next().await.transpose().unwrap() {
    let data = entry.value(); // type of UserRef
    // ......
}

// reverse scan of user (descending order) 
let mut reverse_scan = txn
    .scan((Bound::Included(&name), Bound::Excluded(&upper)))
    .reverse() // scan in descending order
    .limit(10) // optionally limit results
    .take()
    .await
    .unwrap();

while let Some(entry) = reverse_scan.next().await.transpose().unwrap() {
    let data = entry.value(); // records in reverse order
    // ......
}

Persistence

Tonbo employs a Log-Structured Merge Tree (LSM) as its underlying data structure, meaning that some data may reside in memory. To persist this in-memory data, use the flush method.

When Write-Ahead Logging (WAL) is enabled, data is automatically written to disk. However, since Tonbo buffers WAL data by default, you should call the flush_wal method to ensure all data is recovered. If you prefer not to use WAL buffering, you can disable it by setting wal_buffer_size to 0:

let options = DbOption::new(
    Path::from_filesystem_path("./db_path/users").unwrap(),
    &UserSchema,
).wal_buffer_size(0);

If you don't want to use WAL, you can disable it by setting the DbOption::disable_wal.

let options = DbOption::new(
    Path::from_filesystem_path("./db_path/users").unwrap(),
    &UserSchema,
).disable_wal(true);

Note: If you disable WAL, there is nothing to do with flush_wal. You need to call flush method to persist the memory data.

Conversely, if WAL is enabled and wal_buffer_size is set to 0, WAL data is flushed to disk immediately, so calling flush_wal is unnecessary.

Using with S3

If you want to use Tonbo with S3, you can configure DbOption to determine which portions of your data are stored in S3 and which remain on the local disk. The example below demonstrates how to set up this configuration:

let s3_option = FsOptions::S3 {
    bucket: "bucket".to_string(),
    credential: Some(AwsCredential {
        key_id: "key_id".to_string(),
        secret_key: "secret_key".to_string(),
        token: None,
    }),
    endpoint: None,
    sign_payload: None,
    checksum: None,
    region: Some("region".to_string()),
};
let options = DbOption::new(
    Path::from_filesystem_path("./db_path/users").unwrap(),
    &UserSchema,
).level_path(2, "l2", s3_option.clone())
).level_path(3, "l3", s3_option);

In this example, data for level 2 and level 3 will be stored in S3, while all other levels remain on the local disk. If there is data in level 2 and level 3, you can verify and access it in S3:

s3://bucket/l2/
├── xxx.parquet
├── ......
s3://bucket/l3/
├── xxx.parquet
├── ......

For more configuration options, please refer to the Configuration section.

What next?

Tonbo API

DbOption

DbOption is a struct that contains configuration options for the database. Here are some configuration options you can set:

// Creates a new `DbOption` instance with the given path and schema.
// The path is the default path that the database will use.
async fn new(option: DbOption, executor: E, schema: R::Schema) -> Result<Self, DbError<R>>;

// Sets the path of the database.
fn path(self, path: impl Into<Path>) -> Self;

/// disable the write-ahead log. This may risk of data loss during downtime
pub fn disable_wal(self) -> Self;

/// Maximum size of WAL buffer, default value is 4KB
/// If set to 0, the WAL buffer will be disabled.
pub fn wal_buffer_size(self, wal_buffer_size: usize) -> Self;

If you want to learn more about DbOption, you can refer to the Configuration section.

Note: You should make sure the path exists before creating DBOption.

Executor

Tonbo provides an Executor trait that you can implement to execute asynchronous tasks. Tonbo has implemented TokioExecutor(for local disk) and OpfsExecutor(for WASM) for users. You can also customize yourself Executor, here is an example implementation of the Executor trait:

pub struct TokioExecutor {
    handle: Handle,
}

impl TokioExecutor {
    pub fn current() -> Self {
        Self {
            handle: Handle::current(),
        }
    }
}

impl Executor for TokioExecutor {
    fn spawn<F>(&self, future: F)
    where
        F: Future<Output = ()> + MaybeSend + 'static,
    {
        self.handle.spawn(future);
    }
}

Query

You can use get method to get a record by key and you should pass a closure that takes a TransactionEntry instance and returns a Option type. You can use TransactionEntry::get to get a UserRef instance. This UserRef instance is a struct that tonbo generates for you. All fields except primary key are Option type, because you may not have set them when you create the record.

You can use scan method to scan all records that in the specified range. scan method will return a Stream instance and you can iterate all records by using this stream.

/// get the record with `key` as the primary key and process it using closure `f`
let age = db.get(&"Alice".into(),
    |entry| {
        // entry.get() will get a `UserRef`
        let user = entry.get();
        println!("{:#?}", user);
        user.age
    })
    .await
    .unwrap();

let mut scan = db
    .scan((Bound::Included(&name), Bound::Excluded(&upper)))
    .await
    .unwrap();
while let Some(entry) = scan.next().await.transpose().unwrap() {
    let data = entry.value(); // type of UserRef
    // ......
}

// Reverse scan (descending order)
let mut reverse_scan = db
    .scan((Bound::Included(&name), Bound::Excluded(&upper)))
    .reverse() // scan in descending order
    .limit(100) // limit results
    .await
    .unwrap();
while let Some(entry) = reverse_scan.next().await.transpose().unwrap() {
    let data = entry.value(); // newest records first
    // ......
}

Insert/Remove

You can use db.insert(record) or db.insert_batch(records) to insert new records into the database and use db.remove(key) to remove a record from the database. Here is an example of updating the state of database:

let user = User {
    name: "Alice".into(),
    email: Some("alice@gmail.com".into()),
    age: 22,
    bytes: Bytes::from(vec![0, 1, 2]),
};

/// insert a single tonbo record
db.insert(user).await.unwrap();

/// insert a sequence of data as a single batch
db.insert_batch("Alice".into()).await.unwrap();

/// remove the specified record from the database
db.remove("Alice".into()).await.unwrap();

Transaction

Tonbo supports transactions when using a Transaction. You can use db.transaction() to create a transaction, and use txn.commit() to commit the transaction.

Note that Tonbo provides optimistic concurrency control to ensure data consistency which means that if a transaction conflicts with another transaction when committing, Tonbo will fail with a CommitError.

Here is an example of how to use transactions:

// create transaction
let txn = db.transaction().await;

let name = "Alice".into();

txn.insert(User { /* ... */ });
let _user = txn.get(&name, Projection::Parts(vec!["email", "bytes"])).await.unwrap();

let upper = "Blob".into();
// range scan of user
let mut scan = txn
    .scan((Bound::Included(&name), Bound::Excluded(&upper)))
    // tonbo supports pushing down projection
    .projection(&["email", "bytes"])
    // push down limitation
    .limit(1)
    .take()
    .await
    .unwrap();

while let Some(entry) = scan.next().await.transpose().unwrap() {
    let data = entry.value(); // type of UserRef
    // ......
}

Query

Transactions support easily reading the state of keys that are currently batched in a given transaction but not yet committed.

You can use get method to get a record by key, and get method will return a UserRef instance. This UserRef instance is a struct that tonbo generates for you in the compile time. All fields except primary key are Option type, because you may not have set them when you create the record. You can also pass a Projection to specify which fields you want to get. Projection::All will get all fields, Projection::Parts(Vec<&str>) will get only primary key, email and bytes fields(other fields will be None).

You can use scan method to scan all records that in the specified range. scan method will return a Scan instance. You can use take method to get a Stream instance and iterate all records that satisfied. Tonbo also supports pushing down filters and projections. You can use Scan::projection(vec!["id", "email"]) to specify which fields you want to get and use Scan::limit(10) to limit the number of records you want to get.

let txn = db.transaction().await;

let _user = txn.get(&name, Projection::Parts(vec!["email"])).await.unwrap();

let mut scan_stream = txn
    .scan((Bound::Included(&name), Bound::Excluded(&upper)))
    // tonbo supports pushing down projection
    .projection(&["email", "bytes"])
    // push down limitation
    .limit(10)
    .take()
    .await
    .unwrap();
while let Some(entry) = scan_stream.next().await.transpose().unwrap() {
    let data = entry.value(); // type of UserRef
    // ......
}

Insert/Remove

You can use txn.insert(record) to insert a new record into the database and use txn.remove(key) to remove a record from the database. Tonbo will use a B-Tree to store all data that you modified(insert/remove). All your modifications will be committed to the database when only you call txn.commit() successfully. If conflict happens, Tonbo will return an error and all your modifications will be rollback.

Here is an example of how to use transaction to update the state of database:


let mut txn = db.transaction().await;
txn.insert(User {
    id: 10,
    name: "John".to_string(),
    email: Some("john@example.com".to_string()),
});
txn.remove("Alice".into());
txn.commit().await.unwrap();

After create DB, you can execute insert, remove, get and other operations now. But remember that you will get a UserRef instance rather than the User, if you get record from tonbo. This is a struct that tonbo generates for you in the compile time. It may look like:

Using S3 backends

Tonbo supports various storage backends, such as OPFS, S3, and maybe more in the future. Tonbo wiil use local storage by default. If you want to use S3 storage for specific level, you can use DbOption::level_path(FsOptions::S3) so that all files in that level will be pushed to S3.

use tonbo::option::{ AwsCredential, FsOptions, Path };
use tonbo::{executor::tokio::TokioExecutor, DbOption, DB};

#[tokio::main]
async fn main() {
    let fs_option = FsOptions::S3 {
        bucket: "wasm-data".to_string(),
        credential: Some(AwsCredential {
            key_id: "key_id".to_string(),
            secret_key: "secret_key".to_string(),
            token: None,
        }),
        endpoint: None,
        sign_payload: None,
        checksum: None,
        region: Some("region".to_string()),
    };

    let options = DbOption::new(Path::from_filesystem_path("s3_path").unwrap(), &UserSchema)
        .level_path(2, "l2", fs_option);

    let db = DB::<User, TokioExecutor>::new(options, TokioExecutor::default(), UserSchema)
        .await
        .unwrap();
}

If you want to persist metadata files to S3, you can configure DbOption::base_fs with FsOptions::S3{...}. This will enable Tonbo to upload metadata files and WAL files to the specified S3 bucket.

Note: This will not guarantee the latest metadata will be uploaded to S3. If you want to ensure the latest WAL is uploaded, you can use DB::flush_wal. If you want to ensure the latest metadata is uploaded, you can use DB::flush to trigger upload manually. If you want tonbo to trigger upload more frequently, you can adjust DbOption::version_log_snapshot_threshold to a smaller value. The default value is 200.

See more details in Configuration.

Note: If you want to use S3 in WASM, please configure CORS rules for the bucket before using. Here is an example of CORS configuration:

[
    {
        "AllowedHeaders": [
            "*"
        ],
        "AllowedMethods": [
            "GET",
            "PUT",
            "DELETE",
            "HEAD"
        ],
        "AllowedOrigins": [
            "*"
        ],
        "ExposeHeaders": []
    }
]

For more details, please refer to AWS documentation.

Tonbo Python Binding

@Record

Tonbo provides ORM-like macro for ease of use, you can use @Record to define schema of column family.

@Record
class User:
   id = Column(DataType.Int64, name="id", primary_key=True)
   age = Column(DataType.Int16, name="age", nullable=True)
   name = Column(DataType.String, name="name", nullable=False)

This is a bad thing that you should pay attention to.

Warning blocks should be used sparingly in documentation, to avoid "warning fatigue," where people are trained to ignore them because they usually don't matter for what they're doing.

Configuration

Example

from tonbo import DbOption, Column, DataType, Record, TonboDB, Bound
from tonbo.fs import from_filesystem_path
import asyncio

@Record
class User:
   id = Column(DataType.Int64, name="id", primary_key=True)
   age = Column(DataType.Int16, name="age", nullable=True)
   name = Column(DataType.String, name="name", nullable=False)

async def main():
    db = TonboDB(DbOption(from_filesystem_path("db_path/user")), User())
    await db.insert(User(id=18, age=175, name="Alice"))
    record = await db.get(18)
    print(record)

    # use transcaction
    txn = await db.transaction()
    result = await txn.get(18)
    scan = await txn.scan(Bound.Included(18), None, limit=10, projection=["id", "name"])

    async for record in scan:
        print(record)

asyncio.run(main())

Configuration

Tonbo provides a configuration struct DbOption for setting up the database. This section will introduce the configuration options available in Tonbo.

Path Configuration

Tonbo will use local disk as the default storage option(For local is the tokio file, for wasm is the OPFS). If you want to change the default storage backends DbOption::base_path.

pub fn base_fs(mut self, base_fs: FsOptions) -> DbOption;

FsOptions is the configuration options for the file system. Tonbo provides two kinds of file system options: FsOptions::Local and FsOptions::S3.

  • FsOptions::Local: This is required the feature tokio/wasm to be enabled.
  • FsOptions::S3{...}: This is required the feature aws and tokio-http/wasm-http to be enabled. You can use this FsOptions to configure the S3 storage.
pub enum FsOptions {
    #[cfg(any(feature = "tokio", feature = "wasm"))]
    Local,
    #[cfg(feature = "aws")]
    S3 {
        bucket: String,
        credential: Option<AwsCredential>,
        endpoint: Option<String>,
        region: Option<String>,
        sign_payload: Option<bool>,
        checksum: Option<bool>,
    },
}

#[derive(Debug, Clone)]
pub struct AwsCredential {
    /// AWS_ACCESS_KEY_ID
    pub key_id: String,
    /// AWS_SECRET_ACCESS_KEY
    pub secret_key: String,
    /// AWS_SESSION_TOKEN
    pub token: Option<String>,
}
  • bucket: The S3 bucket
  • credential: The credential configuration for S3
    • key_id: The S3 access key
    • secret_key: The S3 secret access key
    • token: is the security token for the aws S3
  • endpoint: The S3 endpoint
  • region: The S3 region
  • sign_payload: Whether to sign payload for the aws S3
  • checksum: Whether to enable checksum for the aws S3

If you want to set specific storage options for SSTables, you can use DbOption::level_path. This method allows you to specify the storage options for each level of SSTables. If you don't specify the storage options for a level, Tonbo will use the default storage options(that is base fs).

pub fn level_path(
    mut self,
    level: usize,
    path: Path,
    fs_options: FsOptions,
) -> Result<DbOption, ExceedsMaxLevel>;

Manifest Configuration

Manifest is used to store the metadata of the database. Whenever the compaction is triggered, the manifest will be updated accordingly. But when time goes by, the manifest file will become large, which will increase the time of recovery. Tonbo will rewrite the manifest file if metadata too much, you can use DbOption::version_log_snapshot_threshold to configure

pub fn version_log_snapshot_threshold(self, version_log_snapshot_threshold: u32) -> DbOption;

If you want to persist metadata files to S3, you can configure DbOption::base_fs with FsOptions::S3{...}. This will enable Tonbo to upload metadata files and WAL files to the specified S3 bucket.

Note: This will not guarantee the latest metadata will be uploaded to S3. If you want to ensure the latest metadata is uploaded, you can use DB::flush to trigger upload manually. If you want tonbo to trigger upload more frequently, you can adjust DbOption::version_log_snapshot_threshold to a smaller value. The default value is 200.

WAL Configuration

Tonbo use WAL(Write-ahead log) to ensure data durability and consistency. It is a mechanism that ensures that data is written to the log before being written to the database. This helps to prevent data loss in case of a system failure.

Tonbo also provides a buffer to improve performance. If you want to flush wal buffer, you can call DbOption::flush_wal. The default buffer size is 4KB. But If you don't want to use wal buffer, you can set the buffer to 0.

pub fn wal_buffer_size(self, wal_buffer_size: usize) -> DbOption;

If you don't want to use WAL, you can disable it by setting the DbOption::disable_wal. But please ensure that losing data is acceptable for you.

pub fn disable_wal(self) -> DbOption;

Compaction Configuration

When memtable reaches the maximum size, we will turn it into a immutable which is read only memtable. But when the number of immutable table reaches the maximum size, we will compact them to SSTables. You can set the DbOption::immutable_chunk_num to control the number of files for compaction.

/// len threshold of `immutables` when minor compaction is triggered
pub fn immutable_chunk_num(self, immutable_chunk_num: usize) -> DbOption;

When the number of files in level L exceeds its limit, we also compact them in a background thread. Tonbo use the major_threshold_with_sst_size and level_sst_magnification to determine when to trigger major compaction. The calculation is as follows:

\[ major\_threshold\_with\_sst\_size * level\_sst\_magnification^{level} \]

major_threshold_with_sst_size is default to 4 and level_sst_magnification is default to 10, which means that the default trigger threshold for level1 is 40 files and 400 for level2.

You can adjust the major_threshold_with_sst_size and level_sst_magnification to control the compaction behavior.

/// threshold for the number of `parquet` when major compaction is triggered
pub fn major_threshold_with_sst_size(self, major_threshold_with_sst_size: usize) -> DbOption

/// magnification that triggers major compaction between different levels
pub fn level_sst_magnification(self, level_sst_magnification: usize) -> DbOption;

You can also change the default SSTable size by setting the DbOption::max_sst_file_size, but we found that the default size is good enough for most use cases.

/// Maximum size of each parquet
pub fn max_sst_file_size(self, max_sst_file_size: usize) -> DbOption

SSTable Configuration

Tonbo use parquet to store data which means you can set WriterProperties for parquet file. You can use DbOption::write_parquet_option to set specific settings for Parquet.

/// specific settings for Parquet
pub fn write_parquet_option(self, write_parquet_properties: WriterProperties) -> DbOption

Here is an example of how to use DbOption::write_parquet_option:

let db_option = DbOption::default().write_parquet_option(
    WriterProperties::builder()
        .set_compression(Compression::LZ4)
        .set_statistics_enabled(EnabledStatistics::Chunk)
        .set_bloom_filter_enabled(true)
        .build(),
);

Explore Tonbo

Tonbo provide DynRecord to support dynamic schema. We have been using it to build Python and WASM bindings for Tonbo. You can find the source code here.

Except using it in Python and WASM bindings for Tonbo, we have also used it to build a SQLite extension, TonboLite. This means that you can do more interesting things with tonbo such as building a PostgreSQL extension and integrating with datafusio.

DynRecord

DynRecord is just like the schema you defined by #[derive(Record)], but the fields are not known at compile time. Therefore, before using it, you need to pass the schema and value by yourself. Here is the constructor of the DynSchema, the schema of DynRecord:

// constructor of DynSchema
pub fn new(schema: Vec<ValueDesc>, primary_index: usize) -> DynSchema;

// constructor of ValueDesc
pub fn new(name: String, datatype: DataType, is_nullable: bool) -> ValueDesc;
  • ValueDesc: represents a field of schema, which contains field name, field type.
    • name: represents the name of the field.
    • datatype: represents the data type of the field.
    • is_nullable: represents whether the field can be nullable.
  • primary_index: represents the index of the primary key field in the schema.
pub fn new(values: Vec<Value>, primary_index: usize) -> DynRecord;

pub fn new(
    datatype: DataType,
    name: String,
    value: Arc<dyn Any + Send + Sync>,
    is_nullable: bool,
) -> Value;
  • Value: represents a field of schema and its value, which contains a field description and the value.
    • datatype: represents the data type of the field.
    • name: represents the name of the field.
    • is_nullable: represents whether the field is nullable.
    • value: represents the value of the field.
  • primary_index: represents the index of the primary key field in the schema.

Now, tonbo support these types for dynamic schema:

Tonbo typeRust type
UInt8/UInt16/UInt32/UInt64u8/u16/u32/u64
Int8/Int16/Int32/Int64i8/i16/i32/i64
Booleanbool
StringString
BytesVec<u8>

It allows you to define a schema at runtime and use it to create records. This is useful when you need to define a schema dynamically or when you need to define a schema that is not known at compile time.

Operations

After creating DynSchema, you can use tonbo just like before. The only difference is that what you insert and get is the type of DynRecord and DynRecordRef.

If you compare the usage with compile-time schema version, you will find that the usage is almost the same. The difference can be summarized into the following 5 points.

  • Use DynSchema to replace xxxSchema(e.g. UserSchema)
  • Use DynRecord instance to replace the instance you defined with #[derive(Record)]
  • All you get from database is DynRecordRef rather than xxxRef(e.g. UserRef)
  • Use Value as the Key of DynRecord. For example, you should pass a Value instance the DB::get method.
  • The value of Value should be the type of Arc<Option<T>> if the column can be nullable.

But if you look at the code, you will find that both DynSchema and xxxSchema implement the Schema trait , both DynRecord and xxxRecord implement the Record trait and both DynRecordRef and xxxRecordRef implement the RecordRef trait. So there is only two difference between them

Create Database

#[tokio::main]
async fn main() {
    // make sure the path exists
    fs::create_dir_all("./db_path/users").unwrap();

    // build DynSchema
    let descs = vec![
        ValueDesc::new("name".to_string(), DataType::String, false),
        ValueDesc::new("email".to_string(), DataType::String, false),
        ValueDesc::new("age".to_string(), DataType::Int8, true),
    ];
    let schema = DynSchema::new(descs, 0);

    let options = DbOption::new(
        Path::from_filesystem_path("./db_path/users").unwrap(),
        &schema,
    );

    let db = DB::<DynRecord, TokioExecutor>::new(options, TokioExecutor::default(), DynSchema)
        .await
        .unwrap();
}

If you want to learn more about DbOption, you can refer to the Configuration section.

Note: You should make sure the path exists before creating DBOption.

Insert

You can use db.insert(record) or db.insert_batch(records) to insert new records into the database just like before. The difference is that you should build insert a DynRecord instance.

Here is an example of how to build a DynRecord instance:

let mut columns = vec![
    Value::new(
        DataType::String,
        "name".to_string(),
        Arc::new("Alice".to_string()),
        false,
    ),
    Value::new(
        DataType::String,
        "email".to_string(),
        Arc::new("abc@tonbo.io".to_string()),
        false,
    ),
    Value::new(
        DataType::Int8,
        "age".to_string(),
        Arc::new(Some(i as i8)),
        true,
    ),
];
let record = DynRecord::new(columns, 0);
  • Value::new will create a new Value instance, which represents the value of the column in the schema. This method receives three parameters:
    • datatype: the data type of the field in the schema
    • name: the name of the field in the schema
    • value: the value of the column. This is the type of Arc<dyn Any>. But please be careful that the value should be the type of Arc<Option<T>> if the column can be nullable.
    • nullable: whether the value is nullable
/// insert a single tonbo record
db.insert(record).await.unwrap();

Remove

You and use db.remove(key) to remove a record from the database. This method receives a Key, which is the primary key of the record. But all columns in the record is a Value, so you can not use it like db.remove("Alice".into()).await.unwrap();. Instead, you should pass a Value to db.remove.

let key = Value::new(
    DataType::String,
    "name".to_string(),
    Arc::new("Alice".to_string()),
    false,
);

db.remove(key).await.unwrap();

Query

You can use get method to get a record by key and you should pass a closure that takes a TransactionEntry instance and returns a Option type. You can use TransactionEntry::get to get a DynRecordRef instance.

You can use scan method to scan all records that in the specified range. scan method will return a Stream instance and you can iterate all records by using this stream.

/// get the record with `key` as the primary key and process it using closure `f`
let age = db.get(key,
    |entry| {
        // entry.get() will get a `DynRecordRef`
        let record_ref = entry.get();
        println!("{:#?}", record_ref);
        record_ref.age
    })
    .await
    .unwrap();

let mut scan = db
    .scan((Bound::Included(&lower_key), Bound::Excluded(&upper_key)))
    .await
    .unwrap();
while let Some(entry) = scan.next().await.transpose().unwrap() {
    let data = entry.value(); // type of DynRecordRef
    // ......
}

Transaction

Tonbo supports transactions when using a Transaction. You can use db.transaction() to create a transaction, and use txn.commit() to commit the transaction.

Note that Tonbo provides optimistic concurrency control to ensure data consistency which means that if a transaction conflicts with another transaction when committing, Tonbo will fail with a CommitError.

Here is an example of how to use transactions:

// create transaction
let txn = db.transaction().await;

let name = Value::new(
    DataType::String,
    "name".to_string(),
    Arc::new("Alice".to_string()),
    false,
);
let upper = Value::new(
    DataType::String,
    "name".to_string(),
    Arc::new("Bob".to_string()),
    false,
);

txn.insert(DynRecord::new(/* */));
let _record_ref = txn.get(&name, Projection::Parts(vec!["email", "bytes"])).await.unwrap();

// range scan of user
let mut scan = txn
    .scan((Bound::Included(&name), Bound::Excluded(&upper)))
    // tonbo supports pushing down projection
    .projection(&["email", "bytes"])
    // push down limitation
    .limit(1)
    .take()
    .await
    .unwrap();

while let Some(entry) = scan.next().await.transpose().unwrap() {
    let data = entry.value(); // type of DynRecordRef
    // ......
}

For more detail about transactions, please refer to the Transactions section.

Using S3 backends

Using S3 as the backend storage is also similar to the usage of compile-time version.

use tonbo::option::{ AwsCredential, FsOptions, Path };
use tonbo::{executor::tokio::TokioExecutor, DbOption, DB};

#[tokio::main]
async fn main() {
    let fs_option = FsOptions::S3 {
        bucket: "wasm-data".to_string(),
        credential: Some(AwsCredential {
            key_id: "key_id".to_string(),
            secret_key: "secret_key".to_string(),
            token: None,
        }),
        endpoint: None,
        sign_payload: None,
        checksum: None,
        region: Some("region".to_string()),
    };

    let descs = vec![
        ValueDesc::new("name".to_string(), DataType::String, false),
        ValueDesc::new("email".to_string(), DataType::String, false),
        ValueDesc::new("age".to_string(), DataType::Int8, true),
    ];
    let schema = DynSchema::new(descs, 0);
    let options = DbOption::new(Path::from_filesystem_path("s3_path").unwrap(), &schema)
        .level_path(2, "l2", fs_option);


    let db = DB::<DynRecord, TokioExecutor>::new(options, TokioExecutor::default(), schema)
        .await
        .unwrap();
}

FAQ

Failed to run custom build command for ring in macOS

Apple Clang is a fork of Clang that is specialized to Apple's wishes. It doesn't support wasm32-unknown-unknown. You need to download and use llvm.org Clang instead. You can refer to this issue for more information.

brew install llvm
echo 'export PATH="/opt/homebrew/opt/llvm/bin:$PATH"' >> ~/.zshrc

Why my data is not recovered and the size of log file and WAL file is 0?

As Tonbo uses buffer for WAL, so it may not be persisted before exiting. You can use DB::flush_wal to ensure WAL is persisted or use DB::flush to trigger compaction manually.

If you don't want to use WAL buffer, you can set DbOption::wal_buffer_size to 0. See more details in Configuration.

How to persist metadata files to S3? / Why metadata files are not persisted in serverless environment like AWS Lambda

If you want to persist metadata files to S3, you can configure DbOption::base_fs with FsOptions::S3{...}. This will enable Tonbo to upload metadata files and WAL files to the specified S3 bucket.

Note: This will not guarantee the latest metadata will be uploaded to S3. If you want to ensure the latest WAL is uploaded, you can use DB::flush_wal. If you want to ensure the latest metadata is uploaded, you can use DB::flush to trigger upload manually. If you want tonbo to trigger upload more frequently, you can adjust DbOption::version_log_snapshot_threshold to a smaller value. The default value is 200.

See more details in Configuration.

Using Tonbo

define your schema

use tonbo::Record;

/// Use macro to define schema of column family just like ORM
/// It provides type-safe read & write API
#[derive(Record, Debug)]
pub struct User {
    #[record(primary_key)]
    name: String,
    email: Option<String>,
    age: u8,
    bytes: Bytes,
}
use std::ops::Bound;

use bytes::Bytes;
use fusio::path::Path;
use futures_util::stream::StreamExt;
use tokio::fs;
use tonbo::{executor::tokio::TokioExecutor, DbOption, Projection, Record, DB};


#[tokio::main]
async fn main() {
    // make sure the path exists
    let _ = fs::create_dir_all("./db_path/users").await;

    let options = DbOption::new(
        Path::from_filesystem_path("./db_path/users").unwrap(),
        &UserSchema,
    );
    // pluggable async runtime and I/O
    let db = DB::new(options, TokioExecutor::default(), UserSchema)
        .await
        .unwrap();

    // insert with owned value
    db.insert(User {
        name: "Alice".into(),
        email: Some("alice@gmail.com".into()),
        age: 22,
        bytes: Bytes::from(vec![0, 1, 2]),
    })
    .await
    .unwrap();

    {
        // tonbo supports transaction
        let txn = db.transaction().await;

        // get from primary key
        let name = "Alice".into();

        // get the zero-copy reference of record without any allocations.
        let user = txn
            .get(
                &name,
                // tonbo supports pushing down projection
                Projection::All,
            )
            .await
            .unwrap();
        assert!(user.is_some());
        assert_eq!(user.unwrap().get().age, Some(22));

        {
            let upper = "Blob".into();
            // range scan of user
            let mut scan = txn
                .scan((Bound::Included(&name), Bound::Excluded(&upper)))
                // tonbo supports pushing down projection
                .projection(vec![1, 3])
                // push down limitation
                .limit(1)
                .take()
                .await
                .unwrap();
            while let Some(entry) = scan.next().await.transpose().unwrap() {
                assert_eq!(
                    entry.value(),
                    Some(UserRef {
                        name: "Alice",
                        email: Some("alice@gmail.com"),
                        age: None,
                        bytes: Some(&[0, 1, 2]),
                    })
                );
            }
        }

        // commit transaction
        txn.commit().await.unwrap();
    }
}

Using under Wasm

This is the Wasm example of tonbo showing how to use tonbo under Wasm.

Cargo.toml

Since only limited features of tokio can be used in wasm, we need to disable tokio and use wasm feature in tonbo.

fusio = { git = "https://github.com/tonbo-io/fusio.git", rev = "216eb446fb0a0c6e5e85bfac51a6f6ed8e5ed606", package = "fusio", version = "0.3.3", features = [
  "dyn",
  "fs",
] }
tonbo = { git = "https://github.com/tonbo-io/tonbo", default-features = false, features = ["wasm"] }

Create DB

Tonbo provide OPFS(origin private file system) as storage backend, but the path is a little different. You should use Path::from_opfs_path or Path::parse rather than Path::from_filesystem_path and it is not permitted to use paths that temporarily step outside the sandbox with something like ../foo or ./bar.

use fusio::path::Path;
use tonbo::{executor::opfs::OpfsExecutor, DbOption, DB};

async fn main() {

    let options = DbOption::new(
        Path::from_opfs_path("db_path/users").unwrap(),
        &UserSchema,
    );
    let db = DB::<User, OpfsExecutor>::new(options, OpfsExecutor::new(), UserSchema)
        .await
        .unwrap();
}

Operations on DB

After create DB instance, you can operate it as usual

let txn = db.transaction().await;

// get from primary key
let name = "Alice".into();

let user = txn.get(&name, Projection::All).await.unwrap();

let upper = "Blob".into();
// range scan of user
let mut scan = txn
    .scan((Bound::Included(&name), Bound::Excluded(&upper)))
    // tonbo supports pushing down projection
    .projection(vec![1])
    // push down limitation
    .limit(1)
    .take()
    .await
    .unwrap();

while let Some(entry) = scan.next().await.transpose().unwrap() {
    assert_eq!(
        entry.value(),
        Some(UserRef {
            name: "Alice",
            email: Some("alice@gmail.com"),
            age: None,
        })
    );
}

Building and Testing

To get started using tonbo you should make sure you have Rust installed on your system. If you haven't alreadly done yet, try following the instructions here.

Building and Testing for Rust

Building and Testing with Non-WASM

To use local disk as storage backend, you should import tokio crate and enable "tokio" feature (enabled by default)

cargo build

If you build Tonbo successfully, you can run the tests with:

cargo test

Building and Testing with WASM

If you want to build tonbo under wasm, you should add wasm32-unknown-unknown target first.

# add wasm32-unknown-unknown target
rustup target add wasm32-unknown-unknown
# build under wasm
cargo build --target wasm32-unknown-unknown --no-default-features --features wasm

Before running the tests, make sure you have installed wasm-pack and run wasm-pack build to build the wasm module. If you build successfully, you can run the tests with:

wasm-pack test --chrome --headless --test wasm --no-default-features --features aws,bytes,opfs

Building and Testing for Python

Building

We use the pyo3 to generate a native Python module and use maturin to build Rust-based Python packages.

First, follow the commands below to build a new Python virtualenv, and install maturin into the virtualenv using Python's package manager, pip:

# setup virtualenv
python -m venv .env
# activate venv
source .env/bin/activate

# install maturin
pip install maturin
# build bindings
maturin develop

Whenever Rust code changes run:

maturin develop

Testing

If you want to run tests, you need to build with "test" options:

maturin develop -E test

After building successfully, you can run the tests with:

# run tests except benchmarks(This need duckdb to be installed)
pytest --ignore=tests/bench -v .

# run all tests
pip install duckdb
python -m pytest

Building and Testing for JavaScript

To build tonbo for JavaScript, you should install wasm-pack. If you haven't already done so, try following the instructions here.

# add wasm32-unknown-unknown target
rustup target add wasm32-unknown-unknown
# build under wasm
wasm-pack build --target web

Submitting a Pull Request

Thanks for your contribution! The Tonbo project welcomes contribution of various types -- new features, bug fixes and reports, typo fixes, etc. If you want to contribute to the Tonbo project, you will need to pass necessary checks. If you have any question, feel free to start a new discussion or issue, or ask in the Tonbo Discord.

Running Tests and Checks

This is a Rust project, so rustup and cargo are the best place to start.

  • cargo check to analyze the current package and report errors.
  • cargo +nightly fmt to format the current code.
  • cargo build to compile the current package.
  • cargo clippy to catch common mistakes and improve code.
  • cargo test to run unit tests.
  • cargo bench to run benchmark tests.

Note: If you have any changes to bindings/python, please make sure to run checks and tests before submitting your PR. If you don not know how to build and run tests, please refer to the Building Tonbo for Python section.

Pull Request title

As described in here, a valid PR title should begin with one of the following prefixes:

  • feat: new feature for the user, not a new feature for build script
  • fix: bug fix for the user, not a fix to a build script
  • doc: changes to the documentation
  • style: formatting, missing semi colons, etc; no production code change
  • refactor: refactoring production code, eg. renaming a variable
  • test: adding missing tests, refactoring tests; no production code change
  • chore: updating grunt tasks etc; no production code change

Here is an example of a valid PR title:

feat: add float type
^--^  ^------------^
|     |
|     +-> Summary in present tense.
|
+-------> Type: chore, docs, feat, fix, refactor, style, or test.

RFC: Composite Primary Keys

This document outlines a practical, incremental plan to add composite (multi-column) primary key support to Tonbo while maintaining backward compatibility. It explains design goals, changes required across the codebase, and a step-by-step implementation and validation plan.

Current Status (2025-08-11)

  • Phase 1: Completed. Plural Schema APIs, fixed projections, and Parquet writer configuration are implemented for single-PK schemas.
    • Schema now exposes primary_key_indices() and primary_key_paths_and_sorting() (src/record/mod.rs). Macro-generated single-PK schemas return one-element slices.
    • Read paths build fixed projections as [0, 1] ∪ PKs using primary_key_indices() (src/lib.rs, src/transaction.rs).
    • DbOption::new configures sorting columns (_ts then PKs) and enables stats + bloom filters for each PK column path (src/option.rs).
  • Phase 2: Not implemented. Composite key types under src/record/key/composite/ are placeholders; derive macro still accepts only a single #[record(primary_key)] and generates a single-field key. No multi-PK trybuild/integration tests.
  • Phase 3: Not implemented. DynSchema remains single-PK (stores one primary_index_arrow, one pk_path, and sorting with a single PK column).

Goals

  • Support multi-column primary keys with lexicographic ordering of PK components.
  • Preserve existing single-column PK behavior and public APIs (backward compatible).
  • Keep zero-copy reads and projection pushdown guarantees for PK columns.
  • Ensure on-disk layout (Parquet) remains sorted by _ts then PK(s), with statistics/bloom filters enabled for PK columns.
  • Make it easy to use via the #[derive(Record)] macro by allowing multiple #[record(primary_key)] fields.

Non-Goals (for this RFC)

  • Foreign keys, cascades, or relational constraints.
  • Secondary indexes.
  • Schema migrations for existing data files.
  • Composite keys in dynamic records in the first phase (can be added subsequently).

High-Level Design

  1. Schema trait changes (completed)
  • Now: Schema exposes primary_key_indices() and primary_key_path().
  • primary_key_index() was removed in favor of the slice-based primary_key_indices().
  • Additive helper: primary_key_paths_and_sorting() returns all PK column paths plus sorting columns.
  • For single-column PKs, implementations return a one-element slice from primary_key_indices().
  1. Composite key type(s)
  • Introduce a composite key in src/record/key/composite/ with lexicographic Ord:
    • Option A (preferred): The macro generates a record-specific key struct, e.g., UserKey { k1: u64, k2: String } and UserKeyRef<'r> { ... }.
    • Option B (interim): Provide generic tuple implementations for (K1, K2), (K1, K2, K3), … up to a small N. Each implements Key and KeyRef with lexicographic Ord, plus Encode/Decode, Hash, Clone.
  • For string/bytes components, KeyRef holds borrowed forms, mirroring current single-PK behavior.
  1. Macro updates (tonbo_macros)
  • Allow multiple #[record(primary_key)] fields. Order of appearance in struct determines comparison order (later we can add order = i if needed).
  • Generate:
    • Record-specific key struct and ref struct (Option A), or map to tuple (Option B).
    • type Key = <GeneratedKey> in Schema impl.
    • fn key(&self) -> <GeneratedKeyRef> in Record impl.
    • fn primary_key_indices(&self) -> Vec<usize> in Schema impl (indices are offset by 2 for _null, _ts).
  • Ensure RecordRef::from_record_batch and projection logic always keep all PK columns, even if they are not listed in the projection.
  • Keep encoding/arrays builders unchanged in signature; they already append values per-field.
  1. Projections and read paths
  • Replace single-index assumptions with multi-index collections:
    • Use [0, 1] ∪ primary_key_indices() to build fixed projections in src/lib.rs and src/transaction.rs.
    • In all RecordRef::projection usages, ensure all PK columns are always retained (already implied by fixed mask).
  1. Parquet writer configuration
  • In DbOption::new, use primary_key_paths_and_sorting() to:
    • Enable stats and bloom filters for each PK column path via .set_column_statistics_enabled() and .set_column_bloom_filter_enabled() (invoke once per path).
    • Set sorting columns as [ SortingColumn(_ts, …), SortingColumn(pk1, …), SortingColumn(pk2, …), … ].
  1. Dynamic records (phase 2)
  • Extend DynSchema to track primary_indices: Vec<usize> in metadata (replacing the single primary_key_index).
  • Update DynRecordRef::new and readers to honor multiple PK indices.
  • Ensure tombstone writes (row == None) still populate all PK columns from Ts.key so ordering/lookups remain correct.
  • Define a composite key wrapper for Value/ValueRef (or generate a per-dyn-schema composite type if feasible). Initially out-of-scope for phase 1.

Step-by-Step Plan

Phase 1: Core plumbing (single-PK stays working)

  1. Extend Schema trait

    • Add primary_key_indices() and primary_key_paths_and_sorting() with default impls wrapping existing methods.
    • Update call sites in DbOption::new, src/lib.rs, and src/transaction.rs to use the plural forms.
    • Acceptance: All tests pass; no behavior change for single-PK users.
  2. Fixed projection refactor

    • Replace single primary_key_index usage with iteration over primary_key_indices() to construct fixed_projection = [0, 1] ∪ PKs.
    • Acceptance: Existing tests and scan/get projections still behave identically for single-PK.
  3. Parquet writer properties

    • Replace single primary_key_path() usage with plural variant to configure stats, bloom filters, and sorting columns for _ts plus all PK components.
    • Acceptance: Files write successfully; read paths unchanged.

Phase 2: Macro + key types

  1. Composite key data structure

    • Implement composite key(s) in src/record/key/composite/ with Encode/Decode, Ord, Hash, Key/KeyRef.
    • Start with tuples (K1, K2), (K1, K2, K3) etc. (Option B) for faster delivery; later switch default macro to per-record key type (Option A).
    • Acceptance: Unit tests confirm lexicographic ordering and encode/decode round-trip for composite keys.
  2. Update #[derive(Record)]

    • Allow multiple #[record(primary_key)] fields and generate:
      • type Key = (<K1>, <K2>, …) (Option B) or <RecordName>Key (Option A).
      • fn key(&self) -> (<K1Ref>, <K2Ref>, …).
      • fn primary_key_indices(&self) -> Vec<usize> with +2 offset.
      • Ensure from_record_batch and projection retain all PK columns.
    • Acceptance: trybuild tests covering multi-PK compile and run; single-PK tests unchanged.
  3. Integration tests

    • Add end-to-end tests: insert/get/remove, range scans, projection, and ordering on 2+ PK fields (e.g., tenant_id: u64, name: String).
    • Acceptance: All new tests pass.

Phase 3: Dynamic records (optional)

  1. DynSchema multi-PK
    • Store primary_indices metadata; update dynamic arrays/refs to keep all PK columns in projections.
    • Provide a composite ValueRef key wrapper for in-memory operations.
    • Ensure tombstones populate PK components from Ts.key in builders (e.g., DynRecordBuilder::push).
    • Acceptance: dynamic tests mirroring integration scenarios pass, including tombstone rows retaining all PK components.

Code Touchpoints

  • Traits/APIs: src/record/mod.rs (Schema), src/option.rs (DbOption::new)
  • Read paths: src/lib.rs (get/scan/package), src/transaction.rs (get/scan)
  • Macro codegen: tonbo_macros/src/record.rs, tonbo_macros/src/keys.rs, tonbo_macros/src/data_type.rs
  • Key types: src/record/key/composite/
  • Dynamic (phase 3): src/record/dynamic/* (incl. tombstone handling in DynRecordBuilder::push)

Testing Strategy

  • Unit tests:

    • Composite key Ord, Eq, Hash, Encode/Decode round-trip.
    • Schema default impl compatibility.
  • trybuild tests:

    • Multiple #[record(primary_key)] in a struct compiles and generates expected APIs.
    • Reject nullable PK components.
  • Integration tests:

    • Insert/get/remove by composite key; range scans across composite key ranges; projection keeps PK columns.
    • WAL/compaction unaffected (basic smoke tests).
  • (Optional) Property tests: ordering equivalence vs. native tuple lexicographic ordering when Option B is used.

  • Tombstones:

    • For both single- and multi-column PKs, verify that tombstone rows (row == None) keep all PK column values populated from Ts.key in Arrow arrays and through RecordRef::from_record_batch.

Backward Compatibility & Migration

  • All existing single-PK code continues to work without changes due to default-impl fallbacks.
  • Users opting into composite PKs need only annotate multiple fields with #[record(primary_key)].
  • No on-disk migration is required for existing tables; new tables with composite PKs will write Parquet sorting columns for all PK components.

Risks and Mitigations

  • API surface increase: keep new APIs additive with conservative defaults.
  • Projection bugs: comprehensive tests to ensure PK columns are always included.
  • Performance: lexicographic compare is standard; Arrow array lengths are uniform, so no extra bounds checks needed.
  • Dynamic records complexity: staged to a later phase to avoid blocking initial delivery.

Example (target macro UX)

#[derive(Record, Debug)]
pub struct User {
    #[record(primary_key)]
    pub tenant_id: u64,
    #[record(primary_key)]
    pub name: String,
    pub email: Option<String>,
    pub age: u8,
}

// Generated (conceptually):
// type Key = (u64, String);
// fn key(&self) -> (u64, &str);
// fn primary_key_indices(&self) -> Vec<usize> { vec![2, 3] }

Delivery Checklist

  • Add Schema plural APIs and refactor call sites.
  • Implement composite key types (tuples first).
  • Enable multiple PK fields in macro; generate composite key/ref and PK indices.
  • Update projection logic to retain all PK columns.
  • Configure Parquet sorting/statistics for all PK components.
  • Add unit/trybuild/integration tests.
  • Update user guide (mention composite PK support and examples).
  • Ensure tombstone rows keep PK component values (builders/readers), validated by tests.

TonboLite

TonboLite is a WASM compatible SQLite extension that allows users to create tables which supports analytical processing directly in SQLite. Its storage engine is powered by our open-source embedded key-value database, Tonbo.

Getting Started

Installation

Prerequisite

To get started using tonbo you should make sure you have Rust installed on your system. If you haven't alreadly done yet, try following the instructions here.

Building

To build TonboLite as an extension, you should enable loadable_extension features

cargo build --release --features loadable_extension

Once building successfully, you will get a file named libsqlite_tonbo.dylib(.dll on windows, .so on most other unixes) in target/release/

target/release/
├── build
├── deps
├── incremental
├── libsqlite_tonbo.d
├── libsqlite_tonbo.dylib
└── libsqlite_tonbo.rlib

Loading TonboLite

SQLite provide .load command to load a SQLite extension. So, you can load TonboLite extension by running the following command:

.load target/release/libsqlite_tonbo

Creating Table

After loading TonboLite extension successfully, you can SQLite Virtual Table syntax to create a table:

CREATE VIRTUAL TABLE temp.tonbo USING tonbo(
    create_sql = 'create table tonbo(id bigint primary key, name varchar, like int)',
    path = 'db_path/tonbo'
);
  • create_sql is a SQL statement that will be executed to create the table.
  • path is the path to the database file.

Inserting Data

After creating a table, you can start to insert data into it using the normal INSERT INTO statement:

INSERT INTO tonbo(id, name, like) VALUES(1, 'tonbo', 100);

Querying Data

After inserting data, you can query them by using the SELECT statement:

SELECT * FROM tonbo;

1|tonbo|100

Updating Data

You can update data in the table using the UPDATE statement:

UPDATE tonbo SET like = 123 WHERE id = 1;

SELECT * FROM tonbo;
1|tonbo|123

Deleting Data

You can also delete data by using the DELETE statement:

DELETE FROM tonbo WHERE id = 1;

Coding with extension

TonboLite extension can also be used in any place that supports loading SQLite extensions. Here is an example of using TonboLite extension in Python:

import sqlite3

conn = sqlite3.connect(":memory")
conn.enable_load_extension(True)
# Load the tonbolite extension
conn.load_extension("target/release/libsqlite_tonbo.dylib")
con.enable_load_extension(False)

conn.execute("CREATE VIRTUAL TABLE temp.tonbo USING tonbo("
                "create_sql = 'create table tonbo(id bigint primary key, name varchar, like int)', "
                "path = 'db_path/tonbo'"
             ")")
conn.execute("INSERT INTO tonbo (id, name, like) VALUES (0, 'lol', 1)")
conn.execute("INSERT INTO tonbo (id, name, like) VALUES (1, 'lol', 100)")
rows = conn.execute("SELECT * FROM tonbo;")
for row in rows:
    print(row)
# ......

Building TonboLite

Build as Extension

To build TonboLite as an extension, you should enable loadable_extension features

cargo build --release --features loadable_extension

Once building successfully, you will get a file named libsqlite_tonbo.dylib(.dll on windows, .so on most other unixes) in target/release/

Build on Rust

cargo build

Build on Wasm

To use TonboLite in wasm, it takes a few steps to build.

  1. Add wasm32-unknown-unknown target
rustup target add wasm32-unknown-unknown
  1. Override toolchain with nightly
rustup override set nightly
  1. Build with wasm-pack
wasm-pack build --target web --no-default-features --features wasm

Once you build successfully, you will get a pkg folder containing compiled js and wasm files. Copy it to your project and then you can start to use it.

const tonbo = await import("./pkg/sqlite_tonbo.js");
await tonbo.default();

// start to use TonboLite ...

TonboLite should be used in a secure context and cross-origin isolated, since it uses SharedArrayBuffer to share memory. Please refer to this article for a detailed explanation.

Usage

Using as Extension

If you do not know how to build TonboLite, please refer to the Building section.

Loading TonboLite Extension

Once building successfully, you will get a file named libsqlite_tonbo.dylib(.dll on windows, .so on most other unixes) in target/release/(or target/debug/).

SQLite provide .load command to load a SQLite extension. So, you can load TonboLite extension by running the following command:

.load target/release/libsqlite_tonbo

Or you can load TonboLite extension in Python or other languages.

import sqlite3

conn = sqlite3.connect(":memory")
conn.enable_load_extension(True)
# Load the tonbolite extension
conn.load_extension("target/release/libsqlite_tonbo.dylib")
con.enable_load_extension(False)

# ......

After loading TonboLite successfully, you can start to use it.

Create Table

Unlike Normal CREATE TABLE statement, TonboLite use SQLite Virtual Table syntax to create a table:

CREATE VIRTUAL TABLE temp.tonbo USING tonbo(
    create_sql = 'create table tonbo(id bigint primary key, name varchar, like int)',
    path = 'db_path/tonbo'
);

Select/Insert/Update/Delete

you can execute SQL statements just like normal SQL in the SQLite. Here is an example:

sqlite> .load target/release/libsqlite_tonbo

sqlite> CREATE VIRTUAL TABLE temp.tonbo USING tonbo(
    create_sql = 'create table tonbo(id bigint primary key, name varchar, like int)',
    path = 'db_path/tonbo'
);
sqlite> insert into tonbo (id, name, like) values (0, 'tonbo', 100);
sqlite> insert into tonbo (id, name, like) values (1, 'sqlite', 200);

sqlite> select * from tonbo;
0|tonbo|100
1|sqlite|200

sqlite> update tonbo set like = 123 where id = 0;

sqlite> select * from tonbo;
0|tonbo|123
1|sqlite|200

sqlite> delete from tonbo where id = 0;

sqlite> select * from tonbo;
1|sqlite|200

Flush

TonboLite use LSM tree to store data, and it use a WAL buffer size to improve performance, so you may need to flush data to disk manually. But SQLite don't provide flush interface, so we choose to implement it in the pragma quick_check.

PRAGMA tonbo.quick_check;

Using in Rust

To use TonboLite in your application, you can import TonboLite in the Cargo.toml file.

tonbolite = { git = "https://github.com/tonbo-io/tonbolite" }

You can create use TonboLite just like in Rusqlite, but you should create table using SQLite Virtual Table syntax:

let _ = std::fs::create_dir_all("./db_path/test");

let db = rusqlite::Connection::open_in_memory()?;
crate::load_module(&db)?;

db.execute_batch(
    "CREATE VIRTUAL TABLE temp.tonbo USING tonbo(
            create_sql = 'create table tonbo(id bigint primary key, name varchar, like int)',
            path = 'db_path/test'
    );"
).unwrap();

db.execute(
    "INSERT INTO tonbo (id, name, like) VALUES (1, 'lol', 12)",
    [],
).unwrap();

let mut stmt = db.prepare("SELECT * FROM tonbo;")?;
let _rows = stmt.query([])?;

for more usage, you can refer to Rusqlite.

One difference is that TonboLite extends pragma quick_check to flush WAL to disk. You can use it like this:

db.pragma(None, "quick_check", "tonbo", |_r| -> rusqlite::Result<()> {
    Ok(())
}).unwrap();

Using in JavaScript

To use TonboLite in wasm, can should enable wasm feature.

tonbolite = { git = "https://github.com/tonbo-io/tonbolite", default-features = false, features = ["wasm"] }

After building successfully, you will get a pkg folder containing compiled js and wasm files. Copy it to your project and then you can start to use it. If you don't know how to build TonboLite on wasm, you can refer to TonboLite.

Here is an example of how to use TonboLite in JavaScript:

const tonbo = await import("./pkg/sqlite_tonbo.js");
await tonbo.default();

const db = new TonboLite('db_path/test');
await db.create(`CREATE VIRTUAL TABLE temp.tonbo USING tonbo(
  create_sql ='create table tonbo(id bigint primary key, name varchar, like int)',
  path = 'db_path/tonbo'
);`);

await db.insert('INSERT INTO tonbo (id, name, like) VALUES (1, \'lol\', 12)');
await conn.delete("DELETE FROM tonbo WHERE id = 4");
await conn.update("UPDATE tonbo SET name = 'tonbo' WHERE id = 6");

const rows = await db.select('SELECT * FROM tonbo limit 10;');
console.log(rows);

await db.flush();

TonboLite should be used in a secure context and cross-origin isolated, since it uses SharedArrayBuffer to share memory. Please refer to this article for a detailed explanation.