Skip to content

Conversation

LikeTheSalad
Copy link
Contributor

@LikeTheSalad LikeTheSalad commented Aug 25, 2025

Creation of a default implementation for the new API that was previously defined here.

Note

It looks like there are a lot of changes, but 27 of those "files changes" are deleted files from the old implementation types and their test types. Also, a couple of file changes are just moving around files and/or adding minor changes.

New usage

The same old API behavior is kept for the new API's default implementation.

Writing

Setting up the signal storage objects as well as their toDisk exporters to be used in an OpenTelemetry instance.

// Root dir
File rootDir = new File("/some/root");

// Setting up span storage
SignalStorage.Span spanStorage = FileSpanStorage.create(new File(rootDir, "spans"));

// Setting up metric storage
SignalStorage.Metric metricStorage = FileMetricStorage.create(new File(rootDir, "metrics"));

// Setting up log storage
SignalStorage.LogRecord logStorage = FileLogRecordStorage.create(new File(rootDir, "logs"));

// Setting up span to disk exporter
SpanToDiskExporter spanToDiskExporter =
    SpanToDiskExporter.builder(spanStorage).setExporterCallback(spanCallback).build();
// Setting up metric to disk
MetricToDiskExporter metricToDiskExporter =
    MetricToDiskExporter.builder(metricStorage).setExporterCallback(metricCallback).build();
// Setting up log to disk exporter
LogRecordToDiskExporter logToDiskExporter =
    LogRecordToDiskExporter.builder(logStorage).setExporterCallback(logCallback).build();

Reading

// Example of reading an exporting spans from disk
OtlpHttpSpanExporter networkExporter;
Iterator<Collection<SpanData>> spanCollections = spanStorage.iterator();
while (spanCollections.hasNext()) {
    networkExporter.export(spanCollections.next());
}

More info in the README.md and DESIGN.md files.

@LikeTheSalad LikeTheSalad requested a review from a team as a code owner August 25, 2025 15:56
@github-actions github-actions bot requested a review from zeitlinger August 25, 2025 15:56
@otelbot-java-contrib
Copy link
Contributor

🔧 The result from spotlessApply was committed to the PR branch.

@LikeTheSalad
Copy link
Contributor Author

It's ready for review. Please take a look, @zeitlinger.

@zeitlinger
Copy link
Member

It doesn't look this is just adding a new implementation for an interface.

Can you create a PR that just moves things around? That would make it easier to review.

@LikeTheSalad
Copy link
Contributor Author

Can you create a PR that just moves things around? That would make it easier to review.

It depends. I was actually trying to do it in 2 parts, but I'm reusing a lot of logic from the old API, while at the same time, I had to change internal tools to adapt them to the new one. So if I don't delete the old API in this PR, then the project won't compile, because of the changes to the old internal tools needed to work with the new implementation. What parts of the current PR are you referring to moving around?

@marandaneto
Copy link
Member

Can you create a PR that just moves things around? That would make it easier to review.

It depends. I was actually trying to do it in 2 parts, but I'm reusing a lot of logic from the old API, while at the same time, I had to change internal tools to adapt them to the new one. So if I don't delete the old API in this PR, then the project won't compile, because of the changes to the old internal tools needed to work with the new implementation. What parts of the current PR are you referring to moving around?

an option would be to stack PRs on top of each other

@LikeTheSalad
Copy link
Contributor Author

LikeTheSalad commented Aug 27, 2025

an option would be to stack PRs on top of each other

Yes, but that still keeps the issue I mentioned, which is that the project won't compile because the old API will no longer be compatible with the internal tools, which have to be slightly changed to better adapt to the new API.


@zeitlinger I know that seeing Files changed 63 is daunting, but a lot of these changes aren't part of the new implementation and mostly just about deleting files and updating tests. These are the specific details:

  • 27 of those files are deletions (with some moved files that Git labels as deleted).
  • 14 of them are test-related.
  • These 4 files changed are just to clean up after an enum that is no longer needed in the new API (I guess I could move this cleanup to another PR, and it'd reduce the count by 5 files).
  • 4 of them aren't code, but markdown and image files used in the markdown ones.

If it helps, the real file changes to implement this API are the following:

  • The exporters package, which create 3 implementations of exporters that store to disk, one per signal, and also moves the callback interface and its noop impl into a subpackage. This accounts for 7 "files changed".
  • The readable file returns a different type of response, making it incompatible with the old api, which propagates to Storage and FromDiskExporters. So the FromDiskExporters were removed and Storage was changed.
  • The Storage class was heavily changed to make it work better with the iterator. After it changed, StorageBuilder wouldn't compile, and it was also not needed for the new API so I removed it.
  • The StorageIterator file is a new class that implements iterator to read using a Storage object.
  • And finally, this storage.impl package has all the new API implementations, one per signal, and the config object that was moved there and renamed. Each implementation delegates to a FileSignalStorage object to avoid duplicating common logic across all impls.

Those are the important changes. Let me know if it helps making the review smoother, if not, I guess I can create an intermediate state in which both APIs can compile, which might cause the need to duplicate internal classes such as Storage and ReadableFile, to make them work with the new API without touching the old ones, so that the old API compiles for the first PR. It seemed like a lot of work that wasn't needed as the old API will get removed anyway, but if you think it's still needed after what I explained above, let me know.

Copy link
Member

@tylerbenson tylerbenson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this new Storage API is as flexible as you suggest. It locks the re-serialization into place and forces extra inefficiencies. It would be better if Storage dealt with raw byte arrays or IO streams.

throws IOException {
return readAndProcess(processing, 1);
@Nullable
public ReadableResult<T> readNext(SignalDeserializer<T> deserializer) throws IOException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a mistake to move the deserialization deeper into the storage as it prevents what I've done in #2190 to avoid unnecessary re-serialization. Perhaps if this class had an additional access method that returned an InputStream factory against the file (so it could be read twice, first to count the items and second to stream it to a network output stream).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this new Storage API is as flexible as you suggest. It locks the re-serialization into place and forces extra inefficiencies. It would be better if Storage dealt with raw byte arrays or IO streams.

I think there might be a misunderstanding of the new API, so I'll clarify it here. In a nutshell, Storage is not part of the API; it's an internal tool. This is the core of the new API: SignalStorage.

Screenshot 2025-08-28 at 08 18 30

It provides a way to write and read a collection of signals, as it's also the way that upstream exporters receive data.

It locks the re-serialization into place and forces extra inefficiencies

The API isn't aware of its implementations, so it's not aware of the default implementation that uses serialization and deserialization. The reason why the API provides an iterator of a collection of signals as a way to read stored data is not to enforce anything around serialization, but simply because that's also what it receives in its write() method.

I think it's a mistake to move the deserialization deeper into the storage as it prevents what I've done in #2190 to avoid unnecessary re-serialization.

Serialization is just an internal part of what the default implementation needs to do to receive and return the same kind of data that it receives, and it was never the intention to make it publicly supported in this module; the same way it is also not publicly supported in the core Java repo. So it shouldn't mean a mistake to move it around, as the users of the default impl shouldn't be aware of it.

As I mentioned to you a couple of days ago when you suggested this approach over DM, I wouldn't recommend that you work on top of the old API, as a lot of it is going to be changed or removed.

Perhaps if this class had an additional access method that returned an InputStream factory against the file (so it could be read twice, first to count the items and second to stream it to a network output stream).

Maybe in the future, not this class, because it's not part of the API, but we might be able to somehow extend SignalStorage to allow consuming the data already serialized, so that it can be forwarded directly to a network exporter. This is definitely something that can help with performance for the specific use case where the storage serializes signals in the same way that they are going to be needed by the network exporter impl that will consume them. The downside of this approach, and the reason why deserializing it is more generic, is that it ties the storage impl to a specific data format and protocol for exporting. So if we wanted to do so in the future, it must be done in a way that custom implementations can provide their own serialized versions of the data, in case the format we use is not what they need, to enable them to work on their own impls.


Alternative with the new API

Since the new API is just an interface, it shouldn't prevent you from creating your own implementation that adds the functionality that you need. To make it work with your use case, you could wrap the default implementation around your custom implementation for the write() part, and then add a method that provides you with an imputstream where you manually read the files to provide the serialized data. Alternatively, if you'd like to reuse disk buffering's internal Storage class, which is the one that manages the stored data, you can add a method to it that does something around the lines of what you mentioned here:

Perhaps if this class had an additional access method that returned an InputStream factory against the file (so it could be read twice, first to count the items and second to stream it to a network output stream).

And use it for your custom implementation. Just bear in mind that, since Storage is internal, there's no guarantee to maintain it as is forever (though I don't see why it should change tbh).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants