-
Notifications
You must be signed in to change notification settings - Fork 19
feat: implement retries #103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for looking into this. This is great.
995ab29 to
6d833f0
Compare
|
Sorry for the slow replies. I was pretty swamped the last days. I'll try to get to this later today or latest on Wednesday. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy with this overall. I think a test would be great if possible. Though I'm not sure if the test framework allows that currently. I'm happy to take a stab at that over the weekend.
The Mutex is already acting as a box.
|
Added tests. I think this is good to merge now. I'll try to release a new version over the weekend |
Azure Application Insights frequently times out, returns HTTP 503
errors, or only accepts partial payloads. This adds transparent retries.
Users of this crate won't have to configure anything.
Using `backon` crate, because it seems small and easy to use.
Design decisions:
- Don't expose implementation. Add default retries that should work for
everyone:
- min_delay = 500ms
- max_delay = 5s
- with jitter (to avoid thundering herds when multiple exports run in parallel)
We wanted no max_times or total delay, because the BatchSpanProcessor
already provides a `max_export_timeout` option. This automatically
ensures that the exporter will stop. However, it's only available in the
experimental async version of the BatchSpanProcessor. And it's missing
in the SimpleSpanProcessor and metric and logs processors. Therefore
this sets a total delay of 35s.
- Add a `with_retry_notify` option. Can be used to log retries and
thereby debug exporter flakyness.
- Use `futures-timer` crate to provide sleep functionality during
retries. Ideally we could use
opentelemetry_sdk::runtime::Runtime::delay for that. But that's behind
an experimental feature and we the SDK doesn't provide the exporters
with a runtime. We'd have to add a configuration to the exporter where
users would need to specify their choosen runtime, which then has to
fit to the rest of their setup, otherwise they see runtime errors.
E.g. tokio-sleep fails if not executed in the context of a Tokio
reactor with:
there is no reactor running, must be called from the context of a Tokio 1.x runtime
This error would also only happen if a retry has to happen, which
means when a call to AppInsights fails. This would make it very hard
to debug.
We're using futures-timer instead, which works with all runtimes,
including tokio and futures-executor. The OpenTelemetry SDK uses the
latter for most default processors right now.
---------
Co-authored-by: Jan Kuehle <[email protected]>
|
Published with version 0.42.0. I hope I didn't remove anything you needed in the commits today. If I did, please let me know. And either way, thanks a lot for getting this started! This has been on my todo list for far too long. |
Azure Application Insights frequently times out, returns HTTP 503 errors, or only accepts partial payloads. This PR adds support for configuring a
backon-based backoff strategy to automatically retry telemetry uploads in such cases.