wip

NathanFlurry · NathanFlurry · commit 75d491f90251 · 2025-09-23T01:43:46.000-07:00
diff --git a/README.md b/README.md
@@ -1,8 +1,8 @@
 # Versioned Binary Application Record Encoding (VBARE)
 
-_Simple schema evoluation with maximum performance_
+_Simple schema evolution with maximum performance_
 
-VBARE is a tiny extension to [BARE](https://baremessages.org/) that provides a way of handling schema evoluation.
+VBARE is a tiny extension to [BARE](https://baremessages.org/) that provides a way of handling schema evolution.
 
 ## Preface: What is BARE?
 
@@ -33,57 +33,58 @@ VBARE is a tiny extension to [BARE](https://baremessages.org/) that provides a w
 
 Also see the [IETF specification](https://www.ietf.org/archive/id/draft-devault-bare-11.html).
 
-## Project goals
+## Project Goals
 
-- fast -- self-contained binary encoding, akin to a tuple -> 
-- simple -- can rewrite in under an hour
-- portable -- cross-language & well standardized
+**Goals:**
+- **Fast** — Self-contained binary encoding, similar to a tuple structure
+- **Simple** — Can be reimplemented in under an hour
+- **Portable** — Cross-language support with well-defined standardization
 
-non-goals:
+**Non-goals:**
+- **Data compactness** — That's what gzip is for
+- **Provide an RPC layer** — This is trivial to implement yourself based on your specific requirements
 
-- data compactness -> that's what gzip is for
-- provide an rpc layer -> this is trivial to do yourself based on your specific requirements
-
-## Use cases
+## Use Cases
 
 - Defining network protocols
-- Storing data at rest that needs to be able to be upgraded
-    - Binary data in the database
+- Storing data at rest that needs to be upgradeable:
+    - Binary data in databases
     - File formats
 
-## At a glance
+## At a Glance
 
-- Every message has a version associated with it
-    - either pre-negotiated (via something like an http request query parameter/handshake) or embedded int he message itself
+- Every message has a version associated with it, either:
+    - Pre-negotiated (via mechanisms like HTTP request query parameters or handshakes)
+    - Embedded in the message itself
 - Applications provide functions to upgrade between protocol versions
-- There is no evolution semantics in the schema itself, just copy and paste the schema to write the new one
+- There are no evolution semantics in the schema itself — simply copy and paste the schema to write a new version
 
-## evolutino philosophy
+## Evolution Philosophy
 
-- declare discrete versions with predefined version indexes
-- manual evolutions simplify the application logic by putting complex defaults in your app code
-- stop making big breaking v1 -> v2 changes, make much smaller changes with more flexibility
-- reshaping structures is important -- not just changing types and names
+- Declare discrete versions with predefined version indexes
+- Manual evolutions simplify application logic by putting complex defaults in your application code
+- Stop making big breaking v1 to v2 changes — instead, make much smaller changes with more flexibility
+- Reshaping structures is important, not just changing types and names
 
-## specification
+## Specification
 
-### versions
+### Versions
 
-each schema version is a monotomically incrementing <TODO: integer type>
+Each schema version is a monotonically incrementing integer. _[TODO: Specify exact integer type]_
 
-### embedded version
+### Embedded Version
 
-embedded version works by inserting a <TODO: integer type> integer at the beginning of the buffer. this integer is used to define which version of the schema is being used.
+Embedded version works by inserting an integer at the beginning of the buffer. This integer is used to define which version of the schema is being used. _[TODO: Specify exact integer type]_
 
-the layout looks like this:
+The layout looks like this:
 
 ```
-TODO
+[TODO: Add layout diagram]
 ```
 
-### pre-negotiated version
+### Pre-negotiated Version
 
-often times, you speicty the protocol version outside of the message iteself. for eaxmple, if making an http request with the version in the path like `POST /v3/users`, we can extract version 3 from the path. in this case, VBARE does not insert a version in to the buffer. for this, vbare simply acts as a simple step function for upgrading/downgrading version data structures.
+Often, you specify the protocol version outside of the message itself. For example, when making an HTTP request with the version in the path like `POST /v3/users`, we can extract version 3 from the path. In this case, VBARE does not insert a version into the buffer. For this use case, VBARE simply acts as a step function for upgrading or downgrading version data structures.
 
 ## Implementations
 
@@ -94,9 +95,9 @@ often times, you speicty the protocol version outside of the message iteself. fo
 
 ([Full list of BARE implementations](https://baremessages.org/))
 
-_Adding an implementation takes less than an hour -- it's really that simple._
+_Adding an implementation takes less than an hour — it's really that simple._
 
-## Current users
+## Current Users
 
 - [Rivet Engine](https://github.com/rivet-dev/engine)
     - [Data at rest](https://github.com/rivet-dev/engine/tree/bbdf1c1c49e307ba252186aa4d75a9452d74fca7/sdks/schemas/data)
@@ -109,91 +110,49 @@ _Adding an implementation takes less than an hour -- it's really that simple._
 
 ## Embedded vs Negotiated Version
 
-TODO
+_[TODO: Add detailed comparison]_
+
+## Comparison with Other Formats
+
+[Read more](./docs/COMPARISON.md)
 
 ## Clients vs Servers
 
-- Only servers need to ahve the evolutions steps
-- clients just send their version
+- Only servers need to have the evolution steps
+- Clients just send their version
 
 ## Downsides
 
-- extensive migration code
-- the older the version the more migration steps (though these migration steps should be effectively free)
-- migration steps are not portable across langauges, but only the server needs to the migration step. so usually this is only done once.
-
-## Comparison
-
-- Protobuf (versioned: yes)
-    - unbelievably poorly designed protocol
-    - makes migrations your problem at runtime by making everything optional
-    - even worse, makes properties have a default value (ie integers) which leads to subtle bugs with serious concequenses
-    - tracking field numbers in a file is a pain in the ass
-- Cap'n'proto (versioned: yes)
-    - includes the rpc layer as part of the library, this is out of the scope of what we want in our schema design
-    - of the schema languages we evaluated, this provides by far the most flexible schema migrations
-    - has poor language support. technically most major languages are supported, but the qulaity of the ipmlementations are lacking. i suspect this is largely due to the complexity of capnproto itself compared to other protocols.
-    - generics are cool. but we opt for simplicity with more repetition.
-    - the learning curve seems the steepest of any other tool
-- cap'n'web (versioned: no)
-    - this is focused on rpc with json. not relevant to what we needed.
-- cbor/messagepack/that mongodb one (versioned: self-describing)
-    - does not have a schema, it's completley self-describing
-    - requires encoding the entire key, not suitable for our needs
-- Flatbuffers (versioned: yes)
-    - intented as a high performance encoding similar to protobuf
-    - still uses indexes like protobuf, unless you use structs
-    - to achieve what we wanted, we'd have to use just structs
-    - schema evolution works similar to protobuf
-    - also requires writing field numbers in the file
-- https://crates.io/crates/bebop (verisoned: no)
-    - provides cross platform compact self-contained binary encoding
-    - rpc is split out in to a separate package, which i like because i don't want to use someone else's rpc
-    - includes json-over-bebop which is nice. currenlty we rely on cbor for this.
-    - could not find docs on schema evolution
-    - considered bebop instead of bare, but bare seemed significantly simpler and more focused
-- https://crates.io/crates/borsh (versioned: no)
-    - provies cross platform compact self-contained binary encoding
-    - considered borsh instead of bare, but bare seemed significantly simpler and more focused
-- rust options like postcard/etc (versioned: no)
-    - also provides self-contained binary encoding
-    - not cross platform 
-
-other deatils not included in this evaluation:
-- number compression (ie static 64 bits vs using minimal bits)
-- zero-copy ser/de
-- json support & extensions
-- rpc
+- Extensive migration code required
+- The older the version, the more migration steps needed (though these migration steps should be effectively free)
+- Migration steps are not portable across languages, but only the server needs the migration steps, so this is usually only implemented once
 
 ## FAQ
 
 ### Why is copying the entire schema for every version better than using decorators for gradual migrations?
 
-- decorators are limited and get very complicated
-- it's unclear what version of the protocol a decorator takes effect -- this is helpful
-- generated sdks become more and more bloated with every change
-- you need a validation build step for your validators
-- things you can do with manual migrations
+- Decorators are limited and become very complicated over time
+- It's unclear at what version of the protocol a decorator takes effect — explicit versions help clarify this
+- Generated SDKs become more and more bloated with every change
+- You need a validation build step for your validators
+- Manual migrations provide more flexibility for complex transformations
 
 ### Why not include RPC?
 
-RPC interfaces are trivial to implement yourself. Libraries that provide RPC interfaces tend to add extra bloat & cognitive load over things like abstracting transports, compatibility with the language's async runtime, and complex codegen to implement handlers.
-
-Usually, you just want a `ToServer` and `ToClient` union that looks like this: [ToClient example](https://github.com/rivet-dev/rivetkit/blob/b81d9536ba7ccad4449639dd83a770eb7c353617/packages/rivetkit/schemas/client-protocol/v1.bare#L34), [ToServer example](https://github.com/rivet-dev/rivetkit/blob/b81d9536ba7ccad4449639dd83a770eb7c353617/packages/rivetkit/schemas/client-protocol/v1.bare#L56)
+RPC interfaces are trivial to implement yourself. Libraries that provide RPC interfaces tend to add extra bloat and cognitive load through things like abstracting transports, compatibility with the language's async runtime, and complex codegen to implement handlers.
 
+Usually, you just want a `ToServer` and `ToClient` union that looks like this: 
+- [ToClient example](https://github.com/rivet-dev/rivetkit/blob/b81d9536ba7ccad4449639dd83a770eb7c353617/packages/rivetkit/schemas/client-protocol/v1.bare#L34)
+- [ToServer example](https://github.com/rivet-dev/rivetkit/blob/b81d9536ba7ccad4449639dd83a770eb7c353617/packages/rivetkit/schemas/client-protocol/v1.bare#L56)
 
 ### Isn't copying the schema going to result in a lot of duplicate code?
 
-- yes. after enough pain and suffering of running production APIS, this is what you will end up doing manually, but in a much more painful way.
-- having schema versions also makes it much easier to reason about how clients are connecting to your system/the state of an application. incremental migrations dno't let you consider other properties/structures.
-- this also lets you reshape your structures.
+Yes, but after enough pain and suffering from running production APIs, this is what you will end up doing manually anyway, but in a much more painful way. Having schema versions also makes it much easier to reason about how clients are connecting to your system and the state of an application. Incremental migrations don't let you consider other properties or structures. This approach also lets you reshape your structures more effectively.
 
 ### Don't migration steps get repetitive?
 
-- most of the time, structures will match exactly. most languages can provide a 1:1 migration.
-- the most complicated migration steps will be very deeply nested structures that changed, but that's pretty simple
+Most of the time, structures will match exactly, and most languages can provide a 1:1 migration. The most complicated migration steps will be for deeply nested structures that changed, but even that is relatively straightforward.
 
 ## License
 
 MIT
-
diff --git a/docs/COMPARISON.md b/docs/COMPARISON.md
@@ -0,0 +1,50 @@
+## Comparison with Other Formats
+
+Details not included in this evaluation:
+- Number compression (e.g., static 64 bits vs using minimal bits)
+- Zero-copy serialization/deserialization
+- JSON support & extensions
+- RPC
+
+### Protobuf (versioned: yes)
+- Poorly designed protocol in our opinion
+- Makes migrations your problem at runtime by making everything optional
+- Even worse, properties have default values (e.g., integers) which leads to subtle bugs with serious consequences
+- Tracking field numbers in a file is tedious
+
+### Cap'n Proto (versioned: yes)
+- Includes the RPC layer as part of the library, which is outside the scope of what we want in our schema design
+- Of the schema languages we evaluated, this provides by far the most flexible schema migrations
+- Has poor language support — technically most major languages are supported, but the quality of the implementations is lacking. We suspect this is largely due to the complexity of Cap'n Proto itself compared to other protocols
+- Generics are interesting, but we opt for simplicity with more repetition
+- The learning curve seems the steepest of any other tool
+
+### Cap'n Web (versioned: no)
+- This is focused on RPC with JSON, which is not relevant to our needs
+
+### CBOR/MessagePack/BSON (versioned: self-describing)
+- Does not have a schema — it's completely self-describing
+- Requires encoding the entire key, not suitable for our needs
+
+### Flatbuffers (versioned: yes)
+- Intended as a high-performance encoding similar to Protobuf
+- Still uses indexes like Protobuf, unless you use structs
+- To achieve what we wanted, we'd have to use only structs
+- Schema evolution works similar to Protobuf
+- Also requires writing field numbers in the file
+
+### Bebop (versioned: no)
+- Provides cross-platform compact self-contained binary encoding
+- RPC is split out into a separate package, which we appreciate because we don't want to use someone else's RPC
+- Includes JSON-over-Bebop which is nice — currently we rely on CBOR for this
+- Could not find documentation on schema evolution
+- We considered Bebop instead of BARE, but BARE seemed significantly simpler and more focused
+
+### Borsh (versioned: no)
+- Provides cross-platform compact self-contained binary encoding
+- We considered Borsh instead of BARE, but BARE seemed significantly simpler and more focused
+
+### Rust-specific Options (Postcard, etc.) (versioned: no)
+- Also provides self-contained binary encoding
+- Not cross-platform
+
diff --git a/typescript/examples/basic/src/migrator.ts b/typescript/examples/basic/src/migrator.ts
@@ -69,24 +69,43 @@ export const migrations = new Map<number, MigrationFn<any, any>>([
   [2, (data: V2.App) => migrateV2ToV3App(data)],
 ]);
 
-// For this example we use JSON for (de)serialization to drive the migration flow.
-// The focus is on demonstrating the vbare migration wiring, not binary I/O.
-const jsonEncoder = new TextEncoder();
-const jsonDecoder = new TextDecoder();
+// Handlers per starting version that use the actual BARE encode/decode.
+// Note: We only rely on deserialize() for migration sequencing; serializeVersion is
+// set to the latest version's encoder for completeness.
+const APP_FROM_V1 = createVersionedDataHandler<V3.App>({
+  currentVersion: CURRENT_VERSION,
+  migrations,
+  serializeVersion: (data: V3.App) => V3.encodeApp(data),
+  deserializeVersion: (bytes: Uint8Array) => V1.decodeApp(bytes) as unknown as V3.App,
+});
 
-export const APP_VERSIONED = createVersionedDataHandler<V3.App>({
+const APP_FROM_V2 = createVersionedDataHandler<V3.App>({
   currentVersion: CURRENT_VERSION,
   migrations,
-  serializeVersion: (data: V3.App) => jsonEncoder.encode(JSON.stringify(data)),
-  deserializeVersion: (bytes: Uint8Array) => JSON.parse(jsonDecoder.decode(bytes)),
+  serializeVersion: (data: V3.App) => V3.encodeApp(data),
+  deserializeVersion: (bytes: Uint8Array) => V2.decodeApp(bytes) as unknown as V3.App,
+});
+
+const APP_FROM_V3 = createVersionedDataHandler<V3.App>({
+  currentVersion: CURRENT_VERSION,
+  migrations,
+  serializeVersion: (data: V3.App) => V3.encodeApp(data),
+  deserializeVersion: (bytes: Uint8Array) => V3.decodeApp(bytes),
 });
 
 export function migrateToLatest(
   app: V1.App | V2.App | V3.App,
   fromVersion: 1 | 2 | 3,
 ): V3.App {
-  if (fromVersion === 3) return app as V3.App;
-  // Use the versioned handler to apply migrations starting from fromVersion.
-  const bytes = jsonEncoder.encode(JSON.stringify(app));
-  return APP_VERSIONED.deserialize(bytes, fromVersion);
+  if (fromVersion === 1) {
+    const bytes = V1.encodeApp(app as V1.App);
+    return APP_FROM_V1.deserialize(bytes, 1);
+  }
+  if (fromVersion === 2) {
+    const bytes = V2.encodeApp(app as V2.App);
+    return APP_FROM_V2.deserialize(bytes, 2);
+  }
+  // v3 -> v3
+  const bytes = V3.encodeApp(app as V3.App);
+  return APP_FROM_V3.deserialize(bytes, 3);
 }