Skip to content

Conversation

shivaspeaks
Copy link
Member

@shivaspeaks shivaspeaks commented Aug 7, 2025

Fixes #11672

@shivaspeaks shivaspeaks marked this pull request as ready for review August 7, 2025 13:50
Copy link
Contributor

@kannanjgithub kannanjgithub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.

Copy link
Member

@ejona86 ejona86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this change, I really want to kill errorDescription, as it is just lastError = Status.INVALID_ARGUMENT. However, some of the fallback code is using errorDescription in really weird ways, so we'd have to clean that up first.

@@ -676,6 +676,8 @@ private final class ResourceSubscriber<T extends ResourceUpdate> {
private ResourceMetadata metadata;
@Nullable
private String errorDescription;
@Nullable
private Status lastError;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: Change data to StatusOr<T> data would have kept this a bit more clear. You still have data vs absent, but at least you don't have yet another thing to deal with.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned that it's a more invasive refactoring change since it would require touching every method that uses the data field. I can try to do it in a follow up PR and keep this PR's scope limited to bug fix?

I'll agree that we are moving to more usage of StatusOr<T> in our codebase everywhere. We should consider here as well for consistency.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's only a few other places that access data. It does look like it'd make all of them more complicated. Most are of the form data != null, and would likely need to become data != null && data.hasValue().

@@ -676,6 +676,8 @@ private final class ResourceSubscriber<T extends ResourceUpdate> {
private ResourceMetadata metadata;
@Nullable
private String errorDescription;
@Nullable
private Status lastError;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unit tests?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unit tests are needed for this? I don't know how I replicate this behaviour. The logic sounds well, we should probably let it merge and see?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unit tests prevent breakages in the future from new code changes.
There are many tests that test onData and onError and testing watcher calls.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A unit test shouldn't be too hard. We find a test that is triggering onError(), and then after getting the onError() add a second subscription.

So as "the first one I noticed" there's ldsResourceUpdated_withXdstpResourceName_withUnknownAuthority().

@Test
public void ldsResourceUpdated_withXdstpResourceName_withUnknownAuthority() {
String ldsResourceName =
"xdstp://unknown.example.com/envoy.config.listener.v3.Listener/listener1";
xdsClient.watchXdsResource(XdsListenerResource.getInstance(), ldsResourceName,
ldsResourceWatcher);
verify(ldsResourceWatcher).onError(errorCaptor.capture());
Status error = errorCaptor.getValue();
assertThat(error.getCode()).isEqualTo(Code.INVALID_ARGUMENT);
assertThat(error.getDescription()).isEqualTo(
"Wrong configuration: xds server does not exist for resource " + ldsResourceName);
assertThat(resourceDiscoveryCalls.poll()).isNull();
xdsClient.cancelXdsResourceWatch(XdsListenerResource.getInstance(), ldsResourceName,
ldsResourceWatcher);
assertThat(resourceDiscoveryCalls.poll()).isNull();
}

You could create a ldsResourceWatcher2 and subscribe to the same resource after the verify(ldsResourceWatcher).onError(errorCaptor.capture());, and expect verify(ldsResourceWatcher2).onError(any());. You can also delete unimportant parts of the test, like checking the status description. You might want to verify that the second watcher gets the same Status, although that could even be optional.

Copy link
Member Author

@shivaspeaks shivaspeaks Aug 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not confident with this unit test since we are not checking the instance to be same. It seems to be getting caught in errorDescription != null(code link) for both the watchers and not checking the savedError.

Copy link
Member Author

@shivaspeaks shivaspeaks Aug 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR introduces lastError, so there's no other test right now for setting lastError.
I tried moving check for savedError before errorDescription but looks like lastError is not getting set.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your PR introduces lastError but the existing control paths in other tests do go there and set it. Just set breakpoint and run all tests in that test class and you will see.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like none of the tests are stopping at that break point from GrpcXdsClientImplTestBase class.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Screenshot 2025-08-19 6 54 21 PM

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I was keeping the break point inside onError where savedError was getting defined. My bad. I made a new commit. PTAL.

@ejona86 ejona86 added the TODO:backport PR needs to be backported. Removed after backport complete label Aug 19, 2025
kannanjgithub
kannanjgithub previously approved these changes Aug 19, 2025
@kannanjgithub kannanjgithub dismissed their stale review August 19, 2025 11:56

Replied in comments

@shivaspeaks shivaspeaks merged commit 2039266 into grpc:master Aug 19, 2025
16 checks passed
@shivaspeaks shivaspeaks deleted the xdsClient-cache-transient-error branch August 19, 2025 16:11
kannanjgithub pushed a commit to kannanjgithub/grpc-java that referenced this pull request Aug 19, 2025
kannanjgithub added a commit that referenced this pull request Aug 20, 2025
…ort) (#12291)

Backport of #12262 to v1.75.x.
---
Fixes #11672

Co-authored-by: MV Shiva <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
TODO:backport PR needs to be backported. Removed after backport complete
Projects
None yet
Development

Successfully merging this pull request may close these issues.

XdsClient does not cache onError for new watchers
3 participants