-
-
Notifications
You must be signed in to change notification settings - Fork 2
feat(sqlite): Split activation blobs to a separate DB #369
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| /// Remove orphaned blobs that don't have a corresponding ID in the meta table. | ||
| /// This method is a garbage collector for the blob store. | ||
| #[instrument(skip_all)] | ||
| pub async fn remove_orphaned_blobs(&self) -> Result<u64, Error> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could call this from the maintenance thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to add this to the main upkeep for now. We can change that if it's a problem.
27a419f to
4fa2eca
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #369 +/- ##
==========================================
+ Coverage 87.53% 87.71% +0.18%
==========================================
Files 20 20
Lines 4628 4926 +298
==========================================
+ Hits 4051 4321 +270
- Misses 577 605 +28 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
109ed15 to
56b1825
Compare
This change creates a new SQLite DB that stores the TaskActivation protobuf, keyed by ID. This table is used to hydrate the protobuf when it is needed in the taskbroker. The original table remains, however it won't contain the protobuf data. In order to ensure that processing could happen without needing the protobuf itself, the on_attempts_exceeded value needed to be pulled out of the protobuf and stored in the table directly. There are now extra functions for fetching just the activation data, `get_activation_blob` and `get_activation_raw_blob`. These fetch the decoded and encoded versions of the protobuf respectively. There is also a new upkeep function that deletes all the blobs that don't have a corresponding ID in the meta table. Upkeep works essentially unchanged for the original table. Once a row is deleted from the meta table, the corresponding blob is considered orphaned. The upkeep function deletes all such blobs from the table that are older than the given delay_ms. All of this is gated behind a config option that determines whether just the old table is used or the new table is used. There are four modes for this migration: 1. Write only to the original table, and read only from the original table (same as before) 2. Write to both tables, but read from the original table. 3. Write to both tables, and read from the new table. 4. Write only to the new table and read from the new table. There will be separate PRs moving the code through the different states, which allows for easy rollbacks in case of problems. Since the new column added to the original table is nullable, this PR can be rolled back safely as well.
56b1825 to
88ae998
Compare
| sqlx::query("DELETE FROM taskactivation_blobs WHERE id = $1") | ||
| .bind(id) | ||
| .execute(&self.blob_write_pool) | ||
| .await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need an if self.is_new_write() guard?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't, the table will exist regardless of the guards. The guards just determine what is in the table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice stuff! This is quite a fundamental change so we should do thorough testing in the sandbox before we consider pushing to production
|
|
||
| match inflight { | ||
| Ok(Some(inflight)) => { | ||
| let result = self.store.get_activation_blob(&inflight.id).await; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's nice that the get_activation_blob path is gated by BlobTableMode, but doing this will double our reads during the transitional phase. If so, I'd imagine we would see throughput hits in our testing.
| .await; | ||
| let activation = match result { | ||
| Ok(activation) => activation, | ||
| Err(e) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thing here as GetTask?
| b.push_bind(row.activation); | ||
| b.push_bind(row.added_at.timestamp()); | ||
| }) | ||
| .push(" ON CONFLICT(id) DO NOTHING") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a note on ON CONFLICT(id) DO NOTHING since I think it has an interesting implication down the road. We know that it's possible for writes to the blob db to succeed and writes to meta db to fail. If a batch fails and is retried later, the blob will not be updated since the ID already exists. This seems ok since the blob doesn't store any information that gets altered by inflight_activation_writer when it gets retried.
|
Closing this as it isn't showing any improvements in the sandbox. Could be revisited later. |
This change creates a new SQLite DB that stores the TaskActivation protobuf, keyed by ID. This table
is used to hydrate the protobuf when it is needed in the taskbroker. The original table remains,
however it won't contain the protobuf data. In order to ensure that processing could happen without
needing the protobuf itself, the on_attempts_exceeded value needed to be pulled out of the protobuf
and stored in the table directly.
There are now extra functions for fetching just the activation data,
get_activation_blobandget_activation_raw_blob. These fetch the decoded and encoded versions of the protobufrespectively.
There is also a new upkeep function that deletes all the blobs that don't have a corresponding ID in
the meta table. Upkeep works essentially unchanged for the original table. Once a row is deleted
from the meta table, the corresponding blob is considered orphaned. The upkeep function deletes all
such blobs from the table that are older than the given delay_ms.
All of this is gated behind a config option that determines whether just the old table is used or
the new table is used. There are four modes for this migration:
There will be separate PRs moving the code through the different states, which allows for easy
rollbacks in case of problems. Since the new column added to the original table is nullable, this PR
can be rolled back safely as well.