Disk buffering api implementation #2183

LikeTheSalad · 2025-08-25T15:56:01Z

Creation of a default implementation for the new API that was previously defined here.

Note

It looks like there are a lot of changes, but 27 of those "files changes" are deleted files from the old implementation types and their test types. Also, a couple of file changes are just moving around files and/or adding minor changes.

New usage

The same old API behavior is kept for the new API's default implementation.

Writing

Setting up the signal storage objects as well as their toDisk exporters to be used in an OpenTelemetry instance.

// Root dir
File rootDir = new File("/some/root");

// Setting up span storage
SignalStorage.Span spanStorage = FileSpanStorage.create(new File(rootDir, "spans"));

// Setting up metric storage
SignalStorage.Metric metricStorage = FileMetricStorage.create(new File(rootDir, "metrics"));

// Setting up log storage
SignalStorage.LogRecord logStorage = FileLogRecordStorage.create(new File(rootDir, "logs"));

// Setting up span to disk exporter
SpanToDiskExporter spanToDiskExporter =
    SpanToDiskExporter.builder(spanStorage).setExporterCallback(spanCallback).build();
// Setting up metric to disk
MetricToDiskExporter metricToDiskExporter =
    MetricToDiskExporter.builder(metricStorage).setExporterCallback(metricCallback).build();
// Setting up log to disk exporter
LogRecordToDiskExporter logToDiskExporter =
    LogRecordToDiskExporter.builder(logStorage).setExporterCallback(logCallback).build();

Reading

// Example of reading an exporting spans from disk
OtlpHttpSpanExporter networkExporter;
Iterator<Collection<SpanData>> spanCollections = spanStorage.iterator();
while (spanCollections.hasNext()) {
    networkExporter.export(spanCollections.next());
}

More info in the README.md and DESIGN.md files.

otelbot-java-contrib · 2025-08-25T15:57:42Z

🔧 The result from spotlessApply was committed to the PR branch.

…n' into disk-buffering-api-implementation

LikeTheSalad · 2025-08-26T07:13:30Z

It's ready for review. Please take a look, @zeitlinger.

zeitlinger · 2025-08-26T13:35:03Z

It doesn't look this is just adding a new implementation for an interface.

Can you create a PR that just moves things around? That would make it easier to review.

LikeTheSalad · 2025-08-26T14:00:23Z

Can you create a PR that just moves things around? That would make it easier to review.

It depends. I was actually trying to do it in 2 parts, but I'm reusing a lot of logic from the old API, while at the same time, I had to change internal tools to adapt them to the new one. So if I don't delete the old API in this PR, then the project won't compile, because of the changes to the old internal tools needed to work with the new implementation. What parts of the current PR are you referring to moving around?

marandaneto · 2025-08-26T15:05:47Z

Can you create a PR that just moves things around? That would make it easier to review.

It depends. I was actually trying to do it in 2 parts, but I'm reusing a lot of logic from the old API, while at the same time, I had to change internal tools to adapt them to the new one. So if I don't delete the old API in this PR, then the project won't compile, because of the changes to the old internal tools needed to work with the new implementation. What parts of the current PR are you referring to moving around?

an option would be to stack PRs on top of each other

LikeTheSalad · 2025-08-27T08:18:39Z

an option would be to stack PRs on top of each other

Yes, but that still keeps the issue I mentioned, which is that the project won't compile because the old API will no longer be compatible with the internal tools, which have to be slightly changed to better adapt to the new API.

@zeitlinger I know that seeing Files changed 63 is daunting, but a lot of these changes aren't part of the new implementation and mostly just about deleting files and updating tests. These are the specific details:

27 of those files are deletions (with some moved files that Git labels as deleted).
14 of them are test-related.
These 4 files changed are just to clean up after an enum that is no longer needed in the new API (I guess I could move this cleanup to another PR, and it'd reduce the count by 5 files).
4 of them aren't code, but markdown and image files used in the markdown ones.

If it helps, the real file changes to implement this API are the following:

The exporters package, which create 3 implementations of exporters that store to disk, one per signal, and also moves the callback interface and its noop impl into a subpackage. This accounts for 7 "files changed".
The readable file returns a different type of response, making it incompatible with the old api, which propagates to Storage and FromDiskExporters. So the FromDiskExporters were removed and Storage was changed.
The Storage class was heavily changed to make it work better with the iterator. After it changed, StorageBuilder wouldn't compile, and it was also not needed for the new API so I removed it.
The StorageIterator file is a new class that implements iterator to read using a Storage object.
And finally, this storage.impl package has all the new API implementations, one per signal, and the config object that was moved there and renamed. Each implementation delegates to a FileSignalStorage object to avoid duplicating common logic across all impls.

Those are the important changes. Let me know if it helps making the review smoother, if not, I guess I can create an intermediate state in which both APIs can compile, which might cause the need to duplicate internal classes such as Storage and ReadableFile, to make them work with the new API without touching the old ones, so that the old API compiles for the first PR. It seemed like a lot of work that wasn't needed as the old API will get removed anyway, but if you think it's still needed after what I explained above, let me know.

tylerbenson

I don't think this new Storage API is as flexible as you suggest. It locks the re-serialization into place and forces extra inefficiencies. It would be better if Storage dealt with raw byte arrays or IO streams.

tylerbenson · 2025-08-27T18:59:56Z

...uffering/src/main/java/io/opentelemetry/contrib/disk/buffering/internal/storage/Storage.java

-      throws IOException {
-    return readAndProcess(processing, 1);
+  @Nullable
+  public ReadableResult<T> readNext(SignalDeserializer<T> deserializer) throws IOException {


I think it's a mistake to move the deserialization deeper into the storage as it prevents what I've done in #2190 to avoid unnecessary re-serialization. Perhaps if this class had an additional access method that returned an InputStream factory against the file (so it could be read twice, first to count the items and second to stream it to a network output stream).

I don't think this new Storage API is as flexible as you suggest. It locks the re-serialization into place and forces extra inefficiencies. It would be better if Storage dealt with raw byte arrays or IO streams.

I think there might be a misunderstanding of the new API, so I'll clarify it here. In a nutshell, Storage is not part of the API; it's an internal tool. This is the core of the new API: SignalStorage.

It provides a way to write and read a collection of signals, as it's also the way that upstream exporters receive data.

It locks the re-serialization into place and forces extra inefficiencies

The API isn't aware of its implementations, so it's not aware of the default implementation that uses serialization and deserialization. The reason why the API provides an iterator of a collection of signals as a way to read stored data is not to enforce anything around serialization, but simply because that's also what it receives in its write() method.

I think it's a mistake to move the deserialization deeper into the storage as it prevents what I've done in #2190 to avoid unnecessary re-serialization.

Serialization is just an internal part of what the default implementation needs to do to receive and return the same kind of data that it receives, and it was never the intention to make it publicly supported in this module; the same way it is also not publicly supported in the core Java repo. So it shouldn't mean a mistake to move it around, as the users of the default impl shouldn't be aware of it.

As I mentioned to you a couple of days ago when you suggested this approach over DM, I wouldn't recommend that you work on top of the old API, as a lot of it is going to be changed or removed.

Perhaps if this class had an additional access method that returned an InputStream factory against the file (so it could be read twice, first to count the items and second to stream it to a network output stream).

Maybe in the future, not this class, because it's not part of the API, but we might be able to somehow extend SignalStorage to allow consuming the data already serialized, so that it can be forwarded directly to a network exporter. This is definitely something that can help with performance for the specific use case where the storage serializes signals in the same way that they are going to be needed by the network exporter impl that will consume them. The downside of this approach, and the reason why deserializing it is more generic, is that it ties the storage impl to a specific data format and protocol for exporting. So if we wanted to do so in the future, it must be done in a way that custom implementations can provide their own serialized versions of the data, in case the format we use is not what they need, to enable them to work on their own impls.

Alternative with the new API

Since the new API is just an interface, it shouldn't prevent you from creating your own implementation that adds the functionality that you need. To make it work with your use case, you could wrap the default implementation around your custom implementation for the write() part, and then add a method that provides you with an imputstream where you manually read the files to provide the serialized data. Alternatively, if you'd like to reuse disk buffering's internal Storage class, which is the one that manages the stored data, you can add a method to it that does something around the lines of what you mentioned here:

Perhaps if this class had an additional access method that returned an InputStream factory against the file (so it could be read twice, first to count the items and second to stream it to a network output stream).

And use it for your custom implementation. Just bear in mind that, since Storage is internal, there's no guarantee to maintain it as is forever (though I don't see why it should change tbh).

LikeTheSalad added 21 commits August 20, 2025 10:54

Making builder static

a76b542

Merge branch 'main' into disk-buffering-api-implementation

e3e38be

Writing spans using FileSpanStorage

54cdbbf

Creating StorageIterator

9b233ee

Clean up from disk exporters

a67f72e

Specifying storage type

01993c3

Updating storage tests

87b1162

Updating tests

cf51489

Removing unused types

a8e543d

Creating FileSignalStorage

4999104

Creating file signal storage implementations

75b41aa

Fixing lint warnings

9af7917

Creating export to disk implementations

1ca3cb2

Updating integration tests

b5fcf76

Merge branch 'main' into disk-buffering-api-implementation

d027e2e

Updating error message

6e35a73

Updating DESIGN.md

8153313

Making callbacks aware of the signal type

1d64de8

Updating README.md

b8c0a59

Updating README.md

cfb818e

Spotless

ac23620

LikeTheSalad requested a review from a team as a code owner August 25, 2025 15:56

github-actions bot assigned LikeTheSalad and zeitlinger Aug 25, 2025

github-actions bot requested a review from zeitlinger August 25, 2025 15:56

./gradlew spotlessApply

ab424be

LikeTheSalad added 3 commits August 25, 2025 17:58

Merge remote-tracking branch 'origin/disk-buffering-api-implementatio…

b336d9b

…n' into disk-buffering-api-implementation

Merge branch 'main' into disk-buffering-api-implementation

1605402

Updating internal log levels

0430c40

Updating tests

a5fe36a

zeitlinger mentioned this pull request Aug 26, 2025

Support encryption on disk #1303

Open

tylerbenson suggested changes Aug 27, 2025

View reviewed changes

LikeTheSalad mentioned this pull request Aug 28, 2025

Make the disk buffered exporting more efficient #2190

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Disk buffering api implementation #2183

Disk buffering api implementation #2183

Uh oh!

LikeTheSalad commented Aug 25, 2025 •

edited

Loading

Uh oh!

otelbot-java-contrib bot commented Aug 25, 2025

Uh oh!

LikeTheSalad commented Aug 26, 2025

Uh oh!

zeitlinger commented Aug 26, 2025

Uh oh!

LikeTheSalad commented Aug 26, 2025

Uh oh!

marandaneto commented Aug 26, 2025

Uh oh!

LikeTheSalad commented Aug 27, 2025 •

edited

Loading

Uh oh!

tylerbenson left a comment

Uh oh!

tylerbenson Aug 27, 2025

Uh oh!

LikeTheSalad Aug 28, 2025

Uh oh!

Uh oh!

Disk buffering api implementation #2183

Are you sure you want to change the base?

Disk buffering api implementation #2183

Uh oh!

Conversation

LikeTheSalad commented Aug 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

New usage

Writing

Reading

Uh oh!

otelbot-java-contrib bot commented Aug 25, 2025

Uh oh!

LikeTheSalad commented Aug 26, 2025

Uh oh!

zeitlinger commented Aug 26, 2025

Uh oh!

LikeTheSalad commented Aug 26, 2025

Uh oh!

marandaneto commented Aug 26, 2025

Uh oh!

LikeTheSalad commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tylerbenson left a comment

Choose a reason for hiding this comment

Uh oh!

tylerbenson Aug 27, 2025

Choose a reason for hiding this comment

Uh oh!

LikeTheSalad Aug 28, 2025

Choose a reason for hiding this comment

Alternative with the new API

Uh oh!

Uh oh!

LikeTheSalad commented Aug 25, 2025 •

edited

Loading

LikeTheSalad commented Aug 27, 2025 •

edited

Loading