Skip to content

Conversation

naftaly
Copy link
Contributor

@naftaly naftaly commented Jul 14, 2025

@naftaly naftaly changed the title Update TracerProviderSdk.swift Fix race in TraceProviderSDK Jul 14, 2025
@naftaly naftaly marked this pull request as ready for review July 14, 2025 21:12
func withLockVoid(_ body: () throws -> Void) rethrows {
try withLock(body)
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me, please run this through swift-format, looks like you're using 4-space tabs, instead of 2.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You bet, thanks for the quick turnaround on the review.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@naftaly naftaly requested a review from bryce-b July 15, 2025 15:33
@naftaly
Copy link
Contributor Author

naftaly commented Jul 15, 2025

@bryce-b do you need to land this or is there a trick for me to land it?

@naftaly
Copy link
Contributor Author

naftaly commented Jul 16, 2025

Is anyone able to merge this branch, I don't have that access. Thanks!

@nachoBonafonte
Copy link
Member

I think it should be implemented using a more granular read/write lock, as it is shown in the sanitizer callstack that many threads can access it for reading.

Copy link
Member

@nachoBonafonte nachoBonafonte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure this is the approach needed here.

@naftaly
Copy link
Contributor Author

naftaly commented Jul 16, 2025

@nachoBonafonte what implementation would you propose? As it was before with the r/w lock there was a race condition. As proposed, the lock will do just fine and perform very well as it is very constrained to a small area of concern which is processed quickly. I am not worried about performance by adding this lock.

@naftaly naftaly requested a review from nachoBonafonte July 16, 2025 14:08
@nachoBonafonte
Copy link
Member

There is already a ReadWriteLock class in OpenTelemetrySdk/Internals/Locks.swift that could be used instead

@naftaly
Copy link
Contributor Author

naftaly commented Jul 16, 2025

@nachoBonafonte that what was there before but the use of it in the provider was flawed as it caused race conditions. This PR fixes that issue. What I'm wondering is what you're thinking of as a better implementation than what I propose in this PR.

@nachoBonafonte
Copy link
Member

What caused the race condition (I imagine because you did not post the whole callback and you changed the locks) is that with Swift using pthread synchronization objects directly can cause race conditions because of the way the memory is handled, and they must be wrapped in classes for them to work properly. Not because it was using a read/write lock for that. Being said that, in that method there is another source of thread race conditions that comes from the access to the shared state.

@naftaly
Copy link
Contributor Author

naftaly commented Jul 16, 2025

It's true that the rw lock wasn't correctly implemented, but it doesn't need to be in a class. That being said, the race condition I'm pointing out is that the lock that was there previously held a read in order to check for the existence of the shared provider. Then it released the lock. Afterwards, did some work and took the write lock to set that new tracer. The time between the read and the write holds o the lock are the race condition I'm talking about.

So basically, this PR fixes the issue you mention and the issue I mention.

@nachoBonafonte
Copy link
Member

nachoBonafonte commented Jul 16, 2025

The time between the read and the write holds o the lock are the race condition I'm talking about.

And what is the memory being accessed concurrently there? I would say none, and it there were something being written, it should be inside the write lock.

@nachoBonafonte
Copy link
Member

And even if we used a standard lock (which I don't think is the proper solution), we should use the one existing in the Locks.swift file I mentioned, and not reimplement it.

@naftaly
Copy link
Contributor Author

naftaly commented Jul 16, 2025

And what is the memory being accessed concurrently there?

It's not concurrent access. The race condition in this case is that you can read and it's nil before another write happens, then the first write happens and that previous write will write a second time, overwriting the previous one leading to multiple instances of the provider with no clear owner.

And even if we used a standard lock (which I don't think is the proper solution), we should use the one existing in the Locks.swift file I mentioned, and not reimplement it.

I wanted to, but it would create a circular dep. We could if we moved it around a bit, but I thought it would be problematic to do that in this PR, but once we fix this issue, we can definitely do it.

@naftaly
Copy link
Contributor Author

naftaly commented Jul 16, 2025

@nachoBonafonte In the spirit of speeding things up and coming to a compromise, I've updated the lock to the read/write lock class so it actually works and eliminates one of the race conditions.

Sound good to you?

@naftaly naftaly requested a review from bryce-b July 16, 2025 17:41
@naftaly naftaly requested a review from ArielDemarco July 16, 2025 17:41
Copy link
Member

@nachoBonafonte nachoBonafonte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It works for me, thanks for the changes

@nachoBonafonte nachoBonafonte merged commit 63c57cf into open-telemetry:main Jul 16, 2025
11 checks passed
@naftaly naftaly deleted the acohen/fix-race branch July 16, 2025 18:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants