Skip to content

Commit 75d491f

Browse files
committed
wip
1 parent b571a65 commit 75d491f

File tree

3 files changed

+135
-107
lines changed

3 files changed

+135
-107
lines changed

README.md

Lines changed: 55 additions & 96 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
# Versioned Binary Application Record Encoding (VBARE)
22

3-
_Simple schema evoluation with maximum performance_
3+
_Simple schema evolution with maximum performance_
44

5-
VBARE is a tiny extension to [BARE](https://baremessages.org/) that provides a way of handling schema evoluation.
5+
VBARE is a tiny extension to [BARE](https://baremessages.org/) that provides a way of handling schema evolution.
66

77
## Preface: What is BARE?
88

@@ -33,57 +33,58 @@ VBARE is a tiny extension to [BARE](https://baremessages.org/) that provides a w
3333
3434
Also see the [IETF specification](https://www.ietf.org/archive/id/draft-devault-bare-11.html).
3535

36-
## Project goals
36+
## Project Goals
3737

38-
- fast -- self-contained binary encoding, akin to a tuple ->
39-
- simple -- can rewrite in under an hour
40-
- portable -- cross-language & well standardized
38+
**Goals:**
39+
- **Fast** — Self-contained binary encoding, similar to a tuple structure
40+
- **Simple** — Can be reimplemented in under an hour
41+
- **Portable** — Cross-language support with well-defined standardization
4142

42-
non-goals:
43+
**Non-goals:**
44+
- **Data compactness** — That's what gzip is for
45+
- **Provide an RPC layer** — This is trivial to implement yourself based on your specific requirements
4346

44-
- data compactness -> that's what gzip is for
45-
- provide an rpc layer -> this is trivial to do yourself based on your specific requirements
46-
47-
## Use cases
47+
## Use Cases
4848

4949
- Defining network protocols
50-
- Storing data at rest that needs to be able to be upgraded
51-
- Binary data in the database
50+
- Storing data at rest that needs to be upgradeable:
51+
- Binary data in databases
5252
- File formats
5353

54-
## At a glance
54+
## At a Glance
5555

56-
- Every message has a version associated with it
57-
- either pre-negotiated (via something like an http request query parameter/handshake) or embedded int he message itself
56+
- Every message has a version associated with it, either:
57+
- Pre-negotiated (via mechanisms like HTTP request query parameters or handshakes)
58+
- Embedded in the message itself
5859
- Applications provide functions to upgrade between protocol versions
59-
- There is no evolution semantics in the schema itself, just copy and paste the schema to write the new one
60+
- There are no evolution semantics in the schema itself — simply copy and paste the schema to write a new version
6061

61-
## evolutino philosophy
62+
## Evolution Philosophy
6263

63-
- declare discrete versions with predefined version indexes
64-
- manual evolutions simplify the application logic by putting complex defaults in your app code
65-
- stop making big breaking v1 -> v2 changes, make much smaller changes with more flexibility
66-
- reshaping structures is important -- not just changing types and names
64+
- Declare discrete versions with predefined version indexes
65+
- Manual evolutions simplify application logic by putting complex defaults in your application code
66+
- Stop making big breaking v1 to v2 changes — instead, make much smaller changes with more flexibility
67+
- Reshaping structures is important, not just changing types and names
6768

68-
## specification
69+
## Specification
6970

70-
### versions
71+
### Versions
7172

72-
each schema version is a monotomically incrementing <TODO: integer type>
73+
Each schema version is a monotonically incrementing integer. _[TODO: Specify exact integer type]_
7374

74-
### embedded version
75+
### Embedded Version
7576

76-
embedded version works by inserting a <TODO: integer type> integer at the beginning of the buffer. this integer is used to define which version of the schema is being used.
77+
Embedded version works by inserting an integer at the beginning of the buffer. This integer is used to define which version of the schema is being used. _[TODO: Specify exact integer type]_
7778

78-
the layout looks like this:
79+
The layout looks like this:
7980

8081
```
81-
TODO
82+
[TODO: Add layout diagram]
8283
```
8384

84-
### pre-negotiated version
85+
### Pre-negotiated Version
8586

86-
often times, you speicty the protocol version outside of the message iteself. for eaxmple, if making an http request with the version in the path like `POST /v3/users`, we can extract version 3 from the path. in this case, VBARE does not insert a version in to the buffer. for this, vbare simply acts as a simple step function for upgrading/downgrading version data structures.
87+
Often, you specify the protocol version outside of the message itself. For example, when making an HTTP request with the version in the path like `POST /v3/users`, we can extract version 3 from the path. In this case, VBARE does not insert a version into the buffer. For this use case, VBARE simply acts as a step function for upgrading or downgrading version data structures.
8788

8889
## Implementations
8990

@@ -94,9 +95,9 @@ often times, you speicty the protocol version outside of the message iteself. fo
9495

9596
([Full list of BARE implementations](https://baremessages.org/))
9697

97-
_Adding an implementation takes less than an hour -- it's really that simple._
98+
_Adding an implementation takes less than an hour it's really that simple._
9899

99-
## Current users
100+
## Current Users
100101

101102
- [Rivet Engine](https://github.com/rivet-dev/engine)
102103
- [Data at rest](https://github.com/rivet-dev/engine/tree/bbdf1c1c49e307ba252186aa4d75a9452d74fca7/sdks/schemas/data)
@@ -109,91 +110,49 @@ _Adding an implementation takes less than an hour -- it's really that simple._
109110

110111
## Embedded vs Negotiated Version
111112

112-
TODO
113+
_[TODO: Add detailed comparison]_
114+
115+
## Comparison with Other Formats
116+
117+
[Read more](./docs/COMPARISON.md)
113118

114119
## Clients vs Servers
115120

116-
- Only servers need to ahve the evolutions steps
117-
- clients just send their version
121+
- Only servers need to have the evolution steps
122+
- Clients just send their version
118123

119124
## Downsides
120125

121-
- extensive migration code
122-
- the older the version the more migration steps (though these migration steps should be effectively free)
123-
- migration steps are not portable across langauges, but only the server needs to the migration step. so usually this is only done once.
124-
125-
## Comparison
126-
127-
- Protobuf (versioned: yes)
128-
- unbelievably poorly designed protocol
129-
- makes migrations your problem at runtime by making everything optional
130-
- even worse, makes properties have a default value (ie integers) which leads to subtle bugs with serious concequenses
131-
- tracking field numbers in a file is a pain in the ass
132-
- Cap'n'proto (versioned: yes)
133-
- includes the rpc layer as part of the library, this is out of the scope of what we want in our schema design
134-
- of the schema languages we evaluated, this provides by far the most flexible schema migrations
135-
- has poor language support. technically most major languages are supported, but the qulaity of the ipmlementations are lacking. i suspect this is largely due to the complexity of capnproto itself compared to other protocols.
136-
- generics are cool. but we opt for simplicity with more repetition.
137-
- the learning curve seems the steepest of any other tool
138-
- cap'n'web (versioned: no)
139-
- this is focused on rpc with json. not relevant to what we needed.
140-
- cbor/messagepack/that mongodb one (versioned: self-describing)
141-
- does not have a schema, it's completley self-describing
142-
- requires encoding the entire key, not suitable for our needs
143-
- Flatbuffers (versioned: yes)
144-
- intented as a high performance encoding similar to protobuf
145-
- still uses indexes like protobuf, unless you use structs
146-
- to achieve what we wanted, we'd have to use just structs
147-
- schema evolution works similar to protobuf
148-
- also requires writing field numbers in the file
149-
- https://crates.io/crates/bebop (verisoned: no)
150-
- provides cross platform compact self-contained binary encoding
151-
- rpc is split out in to a separate package, which i like because i don't want to use someone else's rpc
152-
- includes json-over-bebop which is nice. currenlty we rely on cbor for this.
153-
- could not find docs on schema evolution
154-
- considered bebop instead of bare, but bare seemed significantly simpler and more focused
155-
- https://crates.io/crates/borsh (versioned: no)
156-
- provies cross platform compact self-contained binary encoding
157-
- considered borsh instead of bare, but bare seemed significantly simpler and more focused
158-
- rust options like postcard/etc (versioned: no)
159-
- also provides self-contained binary encoding
160-
- not cross platform
161-
162-
other deatils not included in this evaluation:
163-
- number compression (ie static 64 bits vs using minimal bits)
164-
- zero-copy ser/de
165-
- json support & extensions
166-
- rpc
126+
- Extensive migration code required
127+
- The older the version, the more migration steps needed (though these migration steps should be effectively free)
128+
- Migration steps are not portable across languages, but only the server needs the migration steps, so this is usually only implemented once
167129

168130
## FAQ
169131

170132
### Why is copying the entire schema for every version better than using decorators for gradual migrations?
171133

172-
- decorators are limited and get very complicated
173-
- it's unclear what version of the protocol a decorator takes effect -- this is helpful
174-
- generated sdks become more and more bloated with every change
175-
- you need a validation build step for your validators
176-
- things you can do with manual migrations
134+
- Decorators are limited and become very complicated over time
135+
- It's unclear at what version of the protocol a decorator takes effect — explicit versions help clarify this
136+
- Generated SDKs become more and more bloated with every change
137+
- You need a validation build step for your validators
138+
- Manual migrations provide more flexibility for complex transformations
177139

178140
### Why not include RPC?
179141

180-
RPC interfaces are trivial to implement yourself. Libraries that provide RPC interfaces tend to add extra bloat & cognitive load over things like abstracting transports, compatibility with the language's async runtime, and complex codegen to implement handlers.
181-
182-
Usually, you just want a `ToServer` and `ToClient` union that looks like this: [ToClient example](https://github.com/rivet-dev/rivetkit/blob/b81d9536ba7ccad4449639dd83a770eb7c353617/packages/rivetkit/schemas/client-protocol/v1.bare#L34), [ToServer example](https://github.com/rivet-dev/rivetkit/blob/b81d9536ba7ccad4449639dd83a770eb7c353617/packages/rivetkit/schemas/client-protocol/v1.bare#L56)
142+
RPC interfaces are trivial to implement yourself. Libraries that provide RPC interfaces tend to add extra bloat and cognitive load through things like abstracting transports, compatibility with the language's async runtime, and complex codegen to implement handlers.
183143

144+
Usually, you just want a `ToServer` and `ToClient` union that looks like this:
145+
- [ToClient example](https://github.com/rivet-dev/rivetkit/blob/b81d9536ba7ccad4449639dd83a770eb7c353617/packages/rivetkit/schemas/client-protocol/v1.bare#L34)
146+
- [ToServer example](https://github.com/rivet-dev/rivetkit/blob/b81d9536ba7ccad4449639dd83a770eb7c353617/packages/rivetkit/schemas/client-protocol/v1.bare#L56)
184147

185148
### Isn't copying the schema going to result in a lot of duplicate code?
186149

187-
- yes. after enough pain and suffering of running production APIS, this is what you will end up doing manually, but in a much more painful way.
188-
- having schema versions also makes it much easier to reason about how clients are connecting to your system/the state of an application. incremental migrations dno't let you consider other properties/structures.
189-
- this also lets you reshape your structures.
150+
Yes, but after enough pain and suffering from running production APIs, this is what you will end up doing manually anyway, but in a much more painful way. Having schema versions also makes it much easier to reason about how clients are connecting to your system and the state of an application. Incremental migrations don't let you consider other properties or structures. This approach also lets you reshape your structures more effectively.
190151

191152
### Don't migration steps get repetitive?
192153

193-
- most of the time, structures will match exactly. most languages can provide a 1:1 migration.
194-
- the most complicated migration steps will be very deeply nested structures that changed, but that's pretty simple
154+
Most of the time, structures will match exactly, and most languages can provide a 1:1 migration. The most complicated migration steps will be for deeply nested structures that changed, but even that is relatively straightforward.
195155

196156
## License
197157

198158
MIT
199-

docs/COMPARISON.md

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
## Comparison with Other Formats
2+
3+
Details not included in this evaluation:
4+
- Number compression (e.g., static 64 bits vs using minimal bits)
5+
- Zero-copy serialization/deserialization
6+
- JSON support & extensions
7+
- RPC
8+
9+
### Protobuf (versioned: yes)
10+
- Poorly designed protocol in our opinion
11+
- Makes migrations your problem at runtime by making everything optional
12+
- Even worse, properties have default values (e.g., integers) which leads to subtle bugs with serious consequences
13+
- Tracking field numbers in a file is tedious
14+
15+
### Cap'n Proto (versioned: yes)
16+
- Includes the RPC layer as part of the library, which is outside the scope of what we want in our schema design
17+
- Of the schema languages we evaluated, this provides by far the most flexible schema migrations
18+
- Has poor language support — technically most major languages are supported, but the quality of the implementations is lacking. We suspect this is largely due to the complexity of Cap'n Proto itself compared to other protocols
19+
- Generics are interesting, but we opt for simplicity with more repetition
20+
- The learning curve seems the steepest of any other tool
21+
22+
### Cap'n Web (versioned: no)
23+
- This is focused on RPC with JSON, which is not relevant to our needs
24+
25+
### CBOR/MessagePack/BSON (versioned: self-describing)
26+
- Does not have a schema — it's completely self-describing
27+
- Requires encoding the entire key, not suitable for our needs
28+
29+
### Flatbuffers (versioned: yes)
30+
- Intended as a high-performance encoding similar to Protobuf
31+
- Still uses indexes like Protobuf, unless you use structs
32+
- To achieve what we wanted, we'd have to use only structs
33+
- Schema evolution works similar to Protobuf
34+
- Also requires writing field numbers in the file
35+
36+
### Bebop (versioned: no)
37+
- Provides cross-platform compact self-contained binary encoding
38+
- RPC is split out into a separate package, which we appreciate because we don't want to use someone else's RPC
39+
- Includes JSON-over-Bebop which is nice — currently we rely on CBOR for this
40+
- Could not find documentation on schema evolution
41+
- We considered Bebop instead of BARE, but BARE seemed significantly simpler and more focused
42+
43+
### Borsh (versioned: no)
44+
- Provides cross-platform compact self-contained binary encoding
45+
- We considered Borsh instead of BARE, but BARE seemed significantly simpler and more focused
46+
47+
### Rust-specific Options (Postcard, etc.) (versioned: no)
48+
- Also provides self-contained binary encoding
49+
- Not cross-platform
50+

typescript/examples/basic/src/migrator.ts

Lines changed: 30 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -69,24 +69,43 @@ export const migrations = new Map<number, MigrationFn<any, any>>([
6969
[2, (data: V2.App) => migrateV2ToV3App(data)],
7070
]);
7171

72-
// For this example we use JSON for (de)serialization to drive the migration flow.
73-
// The focus is on demonstrating the vbare migration wiring, not binary I/O.
74-
const jsonEncoder = new TextEncoder();
75-
const jsonDecoder = new TextDecoder();
72+
// Handlers per starting version that use the actual BARE encode/decode.
73+
// Note: We only rely on deserialize() for migration sequencing; serializeVersion is
74+
// set to the latest version's encoder for completeness.
75+
const APP_FROM_V1 = createVersionedDataHandler<V3.App>({
76+
currentVersion: CURRENT_VERSION,
77+
migrations,
78+
serializeVersion: (data: V3.App) => V3.encodeApp(data),
79+
deserializeVersion: (bytes: Uint8Array) => V1.decodeApp(bytes) as unknown as V3.App,
80+
});
7681

77-
export const APP_VERSIONED = createVersionedDataHandler<V3.App>({
82+
const APP_FROM_V2 = createVersionedDataHandler<V3.App>({
7883
currentVersion: CURRENT_VERSION,
7984
migrations,
80-
serializeVersion: (data: V3.App) => jsonEncoder.encode(JSON.stringify(data)),
81-
deserializeVersion: (bytes: Uint8Array) => JSON.parse(jsonDecoder.decode(bytes)),
85+
serializeVersion: (data: V3.App) => V3.encodeApp(data),
86+
deserializeVersion: (bytes: Uint8Array) => V2.decodeApp(bytes) as unknown as V3.App,
87+
});
88+
89+
const APP_FROM_V3 = createVersionedDataHandler<V3.App>({
90+
currentVersion: CURRENT_VERSION,
91+
migrations,
92+
serializeVersion: (data: V3.App) => V3.encodeApp(data),
93+
deserializeVersion: (bytes: Uint8Array) => V3.decodeApp(bytes),
8294
});
8395

8496
export function migrateToLatest(
8597
app: V1.App | V2.App | V3.App,
8698
fromVersion: 1 | 2 | 3,
8799
): V3.App {
88-
if (fromVersion === 3) return app as V3.App;
89-
// Use the versioned handler to apply migrations starting from fromVersion.
90-
const bytes = jsonEncoder.encode(JSON.stringify(app));
91-
return APP_VERSIONED.deserialize(bytes, fromVersion);
100+
if (fromVersion === 1) {
101+
const bytes = V1.encodeApp(app as V1.App);
102+
return APP_FROM_V1.deserialize(bytes, 1);
103+
}
104+
if (fromVersion === 2) {
105+
const bytes = V2.encodeApp(app as V2.App);
106+
return APP_FROM_V2.deserialize(bytes, 2);
107+
}
108+
// v3 -> v3
109+
const bytes = V3.encodeApp(app as V3.App);
110+
return APP_FROM_V3.deserialize(bytes, 3);
92111
}

0 commit comments

Comments
 (0)