-
Notifications
You must be signed in to change notification settings - Fork 96
Zonal buckets update #209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Zonal buckets update #209
Conversation
Signed-off-by: NickGoog <[email protected]>
46790a7
to
b8ab9d8
Compare
7d6b044
to
c8ed35e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Epic, generally LGTM, thanks for amazing repro scripts on benchmarks 💪🏽
I would change commentary slightly with more details and actually add item to changelog.
Avoiding updating CHANGELOG until after main repo update.
Why? We have changelog in this repo too, there are at least 2 more big users of this library. (:
// Both upload and download paths will use zonal bucket gRPC APIs by default. | ||
// Zonal buckets are currently allowlisted; please contact your Google account | ||
// manager if interested. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Both upload and download paths will use zonal bucket gRPC APIs by default. | |
// Zonal buckets are currently allowlisted; please contact your Google account | |
// manager if interested. | |
// UseZonalBuckets enables Zonal gRPC APIs to be used to access (both writes and reads) GCS storage. You can read more about zonal buckets <here> | |
// | |
// Initial benchmarks shown avg. ~50+% reduction of latency for larger Thanos store operations (see: https://github.com/thanos-io/objstore/pull/209#issue-3294127783). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would leave the following info in some public doc and link it from here. Otherwise this comment might age very fast.
Zonal buckets are currently in Google Private Preview; please contact your Google account
manager if interested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As written in suggestion, I would also ignore UseGRPC if zonal is enabled - might be easier for user - not a strong opinion though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also - did I remember right that zonal buckets don't have redundancy? Should we mention this detail in the comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to all the above. Let's maybe add links to docs for this, so people know what zonal buckets are.
Also, maybe we add some NOTE
comment on the field with the consequence of using this (eg higher pricing, redundancy?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would leave the following info in some public doc and link it from here
+1 to all the above. Let's maybe add links to docs for this
I agree. I'd normally do this, but we haven't launched yet, so there are no public docs. Should I put the PR on ice until then? (Not allowed to say when launch is, sadly, but you can probably tell we’re quite far along.)
As written in suggestion, I would also ignore UseGRPC if zonal is enabled
Keeping this toggle is useful for future non-GRPC support
Also - did I remember right that zonal buckets don't have redundancy?
We don’t support Soft Delete, which allows you to restore deleted files within certain restraints: https://cloud.google.com/storage/docs/soft-delete
The other redundancy features that come to mind are dual-region and multi-region buckets: https://cloud.google.com/storage/docs/locations#considerations (these docs also mention one-region buckets have failover between zones within a region)
In contrast to these settings, zonal buckets are tied to a single zone, geographically smaller than a region. Part of the performance comes from the data being closer to where the user needs it. I agree that linking docs would be the best way to explain this, but I’ll leave a TODO for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
This looks good already minus the doc comments.
Might also be worth committing the benchmarks scripts somehow?
// Both upload and download paths will use zonal bucket gRPC APIs by default. | ||
// Zonal buckets are currently allowlisted; please contact your Google account | ||
// manager if interested. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to all the above. Let's maybe add links to docs for this, so people know what zonal buckets are.
Also, maybe we add some NOTE
comment on the field with the consequence of using this (eg higher pricing, redundancy?)
Thanks for the review! Switching to draft mode while I investigate better demonstration of performance improvement. |
Signed-off-by: NickGoog <[email protected]>
c8ed35e
to
4603b4f
Compare
I've updated the PR description with more testing!
Done. Updated the changelog.
I was thinking about this as well, but a number of concerns held me back:
The two most appealing solutions to me are:
|
Changes
Adds zonal buckets option for Google Cloud Storage.
Need to update the docs and objstore version in the main repo:
Verification
Wrote tests and manually tested. See below (performance comparison at bottom!):
Resource setup (uses some internal Google commands that I'll demarcate)
Regular VM login
Regular VM setup
SSH tunnel to view Prometheus from browser at localhost:9090. Run in new terminal tab.
Zonal bucket VM login
Zonal bucket VM setup
SSH tunnel to view Prometheus from browser at localhost:9090. Run in new terminal tab.
Cleanup
Taking measurements quickly after startup…
Regular
Rapid
The above metrics cover downloading the small index headers for each of the thirteen seed blocks. I also added some debug logs to the
compact
command to confirm the download is running and print the latency.VM setup tweak:
Run these commands in a separate terminal SSH’d into the VM:
Walkthrough:
Rerunning in the regular and Rapid VM three times, I obtained:
Regular
Rapid
Known issues
otel-plugin
log spam. Our Go engineers are looking into these non-blocking errors originating from https://github.com/grpc/grpc-go/tree/57b69b47a2da17098c387c8c64c4aab5295cbd3e/stats/opentelemetry[quickstart.sh](http://quickstart.sh/)
script doesn’t successfully spin up 2 / 3 receive components, but 1 is enough to illustrate performance.