Skip to content

blockingUnaryCall withDeadlineAfter RPC request hangs forever #12185

@jianliu

Description

@jianliu

What version of gRPC-Java are you using?

1.50.0 with grpc-all and shaded it

What is your environment?

Linux Centos7,jdk8

What did you expect to see?

The call of blockingUnaryCall witht DeadlineAfter should not block forever

What did you see instead?

request with deadline after 30 secnods thread is hanging forever.

the code of connection:

ManagedChannelBuilder<?> builder = ManagedChannelBuilder.forAddress(connID.getHost(), connID.getPort())
        .usePlaintext()
        .maxInboundMessageSize(GrpcUtil.MAX_INBOUND_MESSAGE_SIZE)
        .keepAliveTime(2, TimeUnit.MINUTES)
        .keepAliveWithoutCalls(true);

ManagedChannel channel = builder.build();

The code will automatically disconnect and reconnect every 10 minutes.

thd code of request:

PolarisGRPCGrpc.PolarisGRPCBlockingStub stub = PolarisGRPCGrpc.newBlockingStub(connection.getChannel())
        .withDeadlineAfter(30000, TimeUnit.MILLISECONDS);
GrpcUtil.attachRequestHeader(stub, GrpcUtil.nextHeartbeatReqId());
ResponseProto.Response heartbeatResponse = stub.heartbeat(buildHeartbeatRequest(req));
GrpcUtil.checkResponse(heartbeatResponse);

the hangs thread stack:

"polaris-async-register-1" #463 prio=5 os_prio=0 cpu=107472.49ms elapsed=613744.43s tid=0x00007fd3818d8000 nid=0x31c waiting on condition  [0x00007fd38519f000]
  java.lang.Thread.State: WAITING (parking)
   at jdk.internal.misc.Unsafe.park([email protected]/Native Method)
   - parking to wait for  <0x00000006b241b5b0> (a shade.polaris.io.grpc.stub.ClientCalls$ThreadlessExecutor)
   at java.util.concurrent.locks.LockSupport.park([email protected]/LockSupport.java:194)
   at shade.polaris.io.grpc.stub.ClientCalls$ThreadlessExecutor.waitAndDrain(ClientCalls.java:748)
   at shade.polaris.io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:157)
   at com.tencent.polaris.specification.api.v1.service.manage.PolarisGRPCGrpc$PolarisGRPCBlockingStub.heartbeat(PolarisGRPCGrpc.java:432)
   at com.tencent.polaris.plugins.connector.grpc.GrpcConnector.heartbeat(GrpcConnector.java:547)
   at com.tencent.polaris.discovery.client.api.DefaultProviderAPI.heartbeat(DefaultProviderAPI.java:164)
   at com.tencent.polaris.discovery.client.api.DefaultProviderAPI$$Lambda$1295/0x0000000801249840.doHeartbeat(Unknown Source)
   at com.tencent.polaris.discovery.client.flow.RegisterFlow.doRunHeartbeat(RegisterFlow.java:80)
   at com.tencent.polaris.discovery.client.flow.RegisterFlow.lambda$registerInstance$0(RegisterFlow.java:66)
   at com.tencent.polaris.discovery.client.flow.RegisterFlow$$Lambda$1298/0x000000080124fc40.run(Unknown Source)
   at java.util.concurrent.Executors$RunnableAdapter.call([email protected]/Executors.java:515)
   at java.util.concurrent.FutureTask.runAndReset([email protected]/FutureTask.java:305)
   at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run([email protected]/ScheduledThreadPoolExecutor.java:305)
   at java.util.concurrent.ThreadPoolExecutor.runWorker([email protected]/ThreadPoolExecutor.java:1128)
   at java.util.concurrent.ThreadPoolExecutor$Worker.run([email protected]/ThreadPoolExecutor.java:628)
   at java.lang.Thread.run([email protected]/Thread.java:829)

PS: I looked at #10838 and #10336. They used retry configuration, but I did not make any retry related configuration.

Steps to reproduce the bug

I tried disconnecting the network, packet loss, etc. but couldn't reproduce the problem.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions