-
Notifications
You must be signed in to change notification settings - Fork 3.9k
Description
What version of gRPC-Java are you using?
grpc-xds 1.56.1, but the recent version (1.64.0) also suffers from this issue.
What is your environment?
Linux/Kubernetes with Istio 1.16.7 installed.
What did you expect to see?
If a new CDS is pushed down to the client and all previous EDS subscription is revoked, it will remove the subscribedResourceTypeUrls[EDS]
entry totally,
grpc-java/xds/src/main/java/io/grpc/xds/client/XdsClientImpl.java
Lines 292 to 295 in fea577c
if (resourceSubscribers.get(type).isEmpty()) { | |
resourceSubscribers.remove(type); | |
subscribedResourceTypeUrls.remove(type.typeUrl()); | |
} |
and neither ACK/NACK will be responded to the XdsServer in the subsequent EDS response,
grpc-java/xds/src/main/java/io/grpc/xds/client/ControlPlaneClient.java
Lines 337 to 345 in fea577c
if (type == null) { | |
logger.log( | |
XdsLogLevel.WARNING, | |
"Ignore an unknown type of DiscoveryResponse: {0}", | |
response.getTypeUrl()); | |
call.startRecvMessage(); | |
return; | |
} |
Then if we add EDS subscription again according to the new CDS, an old nonce will be used. This will lead to issues.
A possible workaround is to comment the removal of subscribedResourceTypeUrls
. Quoted from the official docs,
- The xDS client should ACK or NACK every DiscoveryResponse received from the management server. The response_nonce field tells the server which of its responses the ACK or NACK is associated with.
I would like to raise a PR to resolve this issue.
What did you see instead?
The following is the log from Istiod,
// Users apply VirtualService and DestinationRule update via kubectl
2024-05-24T07:06:53.925724Z debug Handle event update for configuration networking.istio.io/v1alpha3/VirtualService/default/e2e-service-provider
2024-05-24T07:06:53.934328Z debug Handle event update for configuration networking.istio.io/v1alpha3/DestinationRule/default/e2e-service-provider
2024-05-24T07:06:54.024710Z debug Handle event update for configuration networking.istio.io/v1alpha3/DestinationRule/default/e2e-service-provider
2024-05-24T07:06:54.126452Z info ads Push debounce stable[17] 7 for config DestinationRule/default/e2e-service-provider and 3 more configs: 101.655214ms since last change, 235.324446ms since last push, full=true
2024-05-24T07:06:54.126868Z debug gateway reconcile complete in 8.042µs
2024-05-24T07:06:54.127123Z debug ads InitContext 2024-05-24T07:06:54Z/7 for push took 588.958µs
2024-05-24T07:06:54.127140Z info ads XDS: Pushing:2024-05-24T07:06:54Z/7 Services:6 ConnectedEndpoints:1 Version:2024-05-24T07:06:54Z/7
// Istiod pushes CDS update in which all EDS will be unsubscribed
2024-05-24T07:06:54.127407Z info ads CDS: PUSH for node:e2e-service-consumer-base-5dd4cb9fbf-xqc6v.default resources:0 size:0B nonce:95973a27-8c59-4e82-897c-3929efb80890 version:2024-05-24T07:06:54Z/7
// Istiod pushes EDS update but neither ACK nor NACK is received
2024-05-24T07:06:54.127548Z info ads EDS: PUSH for node:e2e-service-consumer-base-5dd4cb9fbf-xqc6v.default resources:2 size:395B empty:0 cached:0/2 nonce:308676e2-26dc-412d-bcb1-8958087d4138 version:2024-05-24T07:06:54Z/7
2024-05-24T07:06:54.127602Z debug grpcgen building lds for e2e-service-consumer-base-5dd4cb9fbf-xqc6v.default with filter:
map[e2e-service-provider.default.svc.cluster.local:{map[e2e-service-provider.default.svc.cluster.local:{}] map[80:{}]}]
2024-05-24T07:06:54.127667Z info ads LDS: PUSH for node:e2e-service-consumer-base-5dd4cb9fbf-xqc6v.default resources:1 size:450B nonce:33198869-77f9-4f1e-a7d5-b43f78e76242 version:2024-05-24T07:06:54Z/7
2024-05-24T07:06:54.128177Z info ads RDS: PUSH for node:e2e-service-consumer-base-5dd4cb9fbf-xqc6v.default resources:1 size:4.8kB nonce:93bec8ee-fe97-4ca0-a242-5dbdc182a484 version:2024-05-24T07:06:54Z/7
2024-05-24T07:06:54.131841Z debug ads ADS:CDS: REQ e2e-service-consumer-base-5dd4cb9fbf-xqc6v.default-1 resources:2 nonce:95973a27-8c59-4e82-897c-3929efb80890 version:2024-05-24T07:06:54Z/7
2024-05-24T07:06:54.131937Z debug ads ADS:CDS: ACK e2e-service-consumer-base-5dd4cb9fbf-xqc6v.default-1 2024-05-24T07:06:54Z/7 95973a27-8c59-4e82-897c-3929efb80890
// Subscription is adjusted from the client-side
2024-05-24T07:06:54.136024Z debug ads ADS:EDS: REQ e2e-service-consumer-base-5dd4cb9fbf-xqc6v.default-1 resources:1 nonce:94c7313f-6355-4f09-aae2-7eb2de165491 version:2024-05-24T07:06:33Z/6
// But nonce is expired
2024-05-24T07:06:54.136057Z debug ads ADS:EDS: REQ e2e-service-consumer-base-5dd4cb9fbf-xqc6v.default-1 Expired nonce received 94c7313f-6355-4f09-aae2-7eb2de165491, sent 308676e2-26dc-412d-bcb1-8958087d4138
2024-05-24T07:06:54.141001Z debug ads ADS:LDS: REQ e2e-service-consumer-base-5dd4cb9fbf-xqc6v.default-1 resources:1 nonce:33198869-77f9-4f1e-a7d5-b43f78e76242 version:2024-05-24T07:06:54Z/7
2024-05-24T07:06:54.141036Z debug ads ADS:LDS: ACK e2e-service-consumer-base-5dd4cb9fbf-xqc6v.default-1 2024-05-24T07:06:54Z/7 33198869-77f9-4f1e-a7d5-b43f78e76242
2024-05-24T07:06:54.143833Z debug ads ADS:RDS: REQ e2e-service-consumer-base-5dd4cb9fbf-xqc6v.default-1 resources:1 nonce:93bec8ee-fe97-4ca0-a242-5dbdc182a484 version:2024-05-24T07:06:54Z/7
2024-05-24T07:06:54.143903Z debug ads ADS:RDS: ACK e2e-service-consumer-base-5dd4cb9fbf-xqc6v.default-1 2024-05-24T07:06:54Z/7 93bec8ee-fe97-4ca0-a242-5dbdc182a484
Steps to reproduce the bug
First, use subset
in the VirtualService,
apiVersion: v1
kind: Service
metadata:
name: e2e-service-provider
spec:
selector:
app: e2e-service-provider
group: ft
ports:
- name: http-80
port: 80
targetPort: 8080
---
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: e2e-service-provider
spec:
gateways:
- istio-system/ingressgateway
- mesh
hosts:
- e2e-service-provider
http:
- match:
- headers:
application:
exact: e2e-service-consumer
x-env-flag:
exact: blue
route:
- destination:
host: e2e-service-provider
subset: e2e-service-provider-red
weight: 100
- match:
- headers:
x-env-flag:
exact: yellow
uri:
exact: /rpc/fetchTag
route:
- destination:
host: e2e-service-provider
subset: e2e-service-provider-red
weight: 100
timeout: 60s
- headers:
request:
set:
x-env-flag: red
match:
- headers:
x-env-flag:
exact: red
- queryParams:
x-env-flag:
exact: red
- sourceLabels:
version: red
route:
- destination:
host: e2e-service-provider
subset: e2e-service-provider-red
timeout: 60s
- route:
- destination:
host: e2e-service-provider
subset: e2e-service-provider-base
timeout: 60s
---
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
name: e2e-service-provider
spec:
host: e2e-service-provider
subsets:
- labels:
version: base
name: e2e-service-provider-base
trafficPolicy:
loadBalancer:
warmupDurationSecs: 60s
- labels:
version: red
name: e2e-service-provider-red
Then apply a new yaml to totally remove subset usages,
# Step1: add services
apiVersion: v1
kind: Service
metadata:
name: e2e-service-provider
spec:
selector:
app: e2e-service-provider
group: ft
version: base
ports:
- name: http-80
port: 80
targetPort: 8080
---
apiVersion: v1
kind: Service
metadata:
name: e2e-service-provider-canary
spec:
selector:
app: e2e-service-provider
group: ft
version: red
ports:
- name: http-80
port: 80
targetPort: 8080
---
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: e2e-service-provider
spec:
gateways:
- istio-system/ingressgateway
- mesh
hosts:
- e2e-service-provider
http:
- match:
- headers:
application:
exact: e2e-service-consumer
x-env-flag:
exact: blue
route:
- destination:
host: e2e-service-provider-canary
weight: 100
- match:
- headers:
x-env-flag:
exact: yellow
uri:
exact: /rpc/fetchTag
route:
- destination:
host: e2e-service-provider-canary
weight: 100
timeout: 60s
- headers:
request:
set:
x-env-flag: red
match:
- headers:
x-env-flag:
exact: red
- queryParams:
x-env-flag:
exact: red
- sourceLabels:
version: red
route:
- destination:
host: e2e-service-provider-canary
timeout: 60s
- route:
- destination:
host: e2e-service-provider
timeout: 60s
---
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
name: e2e-service-provider
spec:
host: e2e-service-provider
trafficPolicy:
loadBalancer:
warmupDurationSecs: 60s
subsets: []