Skip to content

Conversation

veliuenal
Copy link
Contributor

Overview - Support Ticket ID: 98024

getTable operation can randomly time out due to network or API hiccups. Retrying is safe/idempotent and avoids unnecessary pipeline failures.

This PR makes the table lookup operation more resilient by adding a bounded retry loop.

Error:

Caused by: com.google.cloud.bigquery.BigQueryException: Read timed out
	at com.google.cloud.bigquery.spi.v2.HttpBigQueryRpc.translate(HttpBigQueryRpc.java:115)
	at com.google.cloud.bigquery.spi.v2.HttpBigQueryRpc.getTable(HttpBigQueryRpc.java:322)
	at com.google.cloud.bigquery.BigQueryImpl$18.call(BigQueryImpl.java:812)
	at com.google.cloud.bigquery.BigQueryImpl$18.call(BigQueryImpl.java:809)
	at com.google.api.gax.retrying.DirectRetryingExecutor.submit(DirectRetryingExecutor.java:103)
	at com.google.cloud.RetryHelper.run(RetryHelper.java:76)
	at com.google.cloud.RetryHelper.runWithRetries(RetryHelper.java:50)
	at com.google.cloud.bigquery.BigQueryImpl.getTable(BigQueryImpl.java:808)
	at com.wepay.kafka.connect.bigquery.write.row.GcsToBqWriter.writeRows(GcsToBqWriter.java:132)
	at com.wepay.kafka.connect.bigquery.write.batch.GcsBatchTableWriter.run(GcsBatchTableWriter.java:78)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at com.wepay.kafka.connect.bigquery.BigQuerySinkTask$MdcContextThreadFactory.lambda$newThread$0(BigQuerySinkTask.java:774)
	... 1 more
Caused by: java.net.SocketTimeoutException: Read timed out
	at java.base/sun.nio.ch.NioSocketImpl.timedRead(NioSocketImpl.java:288)
	at java.base/sun.nio.ch.NioSocketImpl.implRead(NioSocketImpl.java:314)
	at java.base/sun.nio.ch.NioSocketImpl.read(NioSocketImpl.java:355)
	at java.base/sun.nio.ch.NioSocketImpl$1.read(NioSocketImpl.java:808)
	at java.base/java.net.Socket$SocketInputStream.read(Socket.java:966)
	at java.base/sun.security.ssl.SSLSocketInputRecord.read(SSLSocketInputRecord.java:484)
	at java.base/sun.security.ssl.SSLSocketInputRecord.readHeader(SSLSocketInputRecord.java:478)
	at java.base/sun.security.ssl.SSLSocketInputRecord.bytesInCompletePacket(SSLSocketInputRecord.java:70)
	at java.base/sun.security.ssl.SSLSocketImpl.readApplicationRecord(SSLSocketImpl.java:1465)
	at java.base/sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:1069)
	at java.base/java.io.BufferedInputStream.fill(BufferedInputStream.java:244)
	at java.base/java.io.BufferedInputStream.read1(BufferedInputStream.java:284)
	at java.base/java.io.BufferedInputStream.read(BufferedInputStream.java:343)
	at java.base/sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:826)
	at java.base/sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:761)
	at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1725)
	at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1626)
	at java.base/java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:529)
	at java.base/sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:308)
	at com.google.api.client.http.javanet.NetHttpResponse.<init>(NetHttpResponse.java:36)
	at com.google.api.client.http.javanet.NetHttpRequest.execute(NetHttpRequest.java:152)
	at com.google.api.client.http.javanet.NetHttpRequest.execute(NetHttpRequest.java:84)
	at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1012)
	at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:552)
	at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:493)
	at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:603)
	at com.google.cloud.bigquery.spi.v2.HttpBigQueryRpc.getTable(HttpBigQueryRpc.java:320)
	... 12 more

Tests :

[INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.286 s - in com.wepay.kafka.connect.bigquery.write.storage.StorageWriteApiWriterTest
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.037 s - in com.wepay.kafka.connect.bigquery.write.storage.BigQueryBuilderTest
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.007 s - in com.wepay.kafka.connect.bigquery.write.storage.BigQueryWriteSettingsBuilderTest
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.029 s - in com.wepay.kafka.connect.bigquery.write.storage.GcsBuilderTest
[INFO] Tests run: 17, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.105 s - in com.wepay.kafka.connect.bigquery.write.storage.StorageWriteApiBatchApplicationStreamTest
[INFO] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.004 s - in com.wepay.kafka.connect.bigquery.write.storage.StorageApiBatchModeHandlerTest
[INFO] Tests run: 15, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.035 s - in com.wepay.kafka.connect.bigquery.write.storage.StorageWriteApiDefaultStreamTest
[INFO] Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.126 s - in com.wepay.kafka.connect.bigquery.write.row.BigQueryWriterTest
[INFO] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.012 s - in com.wepay.kafka.connect.bigquery.write.row.GcsToBqWriterTest
[INFO] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.157 s - in com.wepay.kafka.connect.bigquery.GcpClientBuilderProjectTest
[INFO] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.025 s - in com.wepay.kafka.connect.bigquery.config.GcsBucketValidatorTest
[INFO] Tests run: 23, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.011 s - in com.wepay.kafka.connect.bigquery.config.BigQuerySinkConfigTest
[INFO] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.001 s - in com.wepay.kafka.connect.bigquery.config.PartitioningModeValidatorTest
[INFO] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.002 s - in com.wepay.kafka.connect.bigquery.config.MultiPropertyValidatorTest
[INFO] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.006 s - in com.wepay.kafka.connect.bigquery.config.CredentialsValidatorTest
[INFO] Tests run: 11, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.005 s - in com.wepay.kafka.connect.bigquery.config.StorageWriteApiValidatorTest
[INFO] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.002 s - in com.wepay.kafka.connect.bigquery.config.PartitioningTypeValidatorTest
[INFO] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.001 s - in com.wepay.kafka.connect.bigquery.utils.PartitionedTableIdTest
[INFO] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.001 s - in com.wepay.kafka.connect.bigquery.utils.FieldNameSanitizerTest
[INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0 s - in com.wepay.kafka.connect.bigquery.ErrantRecordHandlerTest
[INFO] Tests run: 23, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.168 s - in com.wepay.kafka.connect.bigquery.BigQuerySinkTaskTest
[INFO] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.126 s - in com.wepay.kafka.connect.bigquery.BigQueryStorageApiBatchSinkTaskTest
[INFO] Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.002 s - in com.wepay.kafka.connect.bigquery.BigQuerySinkConnectorTest
[INFO] Tests run: 12, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.036 s - in com.wepay.kafka.connect.bigquery.MergeQueriesTest
[INFO] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.112 s - in com.wepay.kafka.connect.bigquery.BigQueryStorageApiSinkTaskTest
[INFO] Tests run: 17, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.004 s - in com.wepay.kafka.connect.bigquery.exception.BigQueryStorageWriteApiErrorResponsesTest
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0 s - in com.wepay.kafka.connect.bigquery.exception.BigQueryErrorResponsesTest
[INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.001 s - in com.wepay.kafka.connect.bigquery.exception.BigQueryStorageWriteApiConnectExceptionTest
[INFO] Tests run: 39, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.047 s - in com.wepay.kafka.connect.bigquery.SchemaManagerTest
[INFO] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.03 s - in com.wepay.kafka.connect.bigquery.GcsToBqLoadRunnableTest
[INFO] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.001 s - in com.wepay.kafka.connect.bigquery.convert.KafkaDataConverterTest
[INFO] Tests run: 25, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.066 s - in com.wepay.kafka.connect.bigquery.convert.BigQuerySchemaConverterTest
[INFO] Tests run: 26, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.008 s - in com.wepay.kafka.connect.bigquery.convert.BigQueryRecordConverterTest
[INFO] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.001 s - in com.wepay.kafka.connect.bigquery.convert.logicaltype.KafkaLogicalConvertersTest
[INFO] Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.003 s - in com.wepay.kafka.connect.bigquery.convert.logicaltype.DebeziumLogicalConvertersTest
[INFO] Results:
[INFO] 
[INFO] Tests run: 297, Failures: 0, Errors: 0, Skipped: 0
[INFO] 
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  5.132 s
[INFO] Finished at: 2025-08-18T17:36:55+02:00
[INFO] ------------------------------------------------------------------------

@veliuenal
Copy link
Contributor Author

@Claudenw
Thanks for your detailed comment.
I've updated the function and included the tests below.

Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.298 s - in com.wepay.kafka.connect.bigquery.write.storage.StorageWriteApiWriterTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.037 s - in com.wepay.kafka.connect.bigquery.write.storage.BigQueryBuilderTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.006 s - in com.wepay.kafka.connect.bigquery.write.storage.BigQueryWriteSettingsBuilderTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.03 s - in com.wepay.kafka.connect.bigquery.write.storage.GcsBuilderTest
Tests run: 17, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.104 s - in com.wepay.kafka.connect.bigquery.write.storage.StorageWriteApiBatchApplicationStreamTest
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.004 s - in com.wepay.kafka.connect.bigquery.write.storage.StorageApiBatchModeHandlerTest
Tests run: 15, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.038 s - in com.wepay.kafka.connect.bigquery.write.storage.StorageWriteApiDefaultStreamTest
Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.13 s - in com.wepay.kafka.connect.bigquery.write.row.BigQueryWriterTest
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.014 s - in com.wepay.kafka.connect.bigquery.write.row.GcsToBqWriterTest
Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.229 s - in com.wepay.kafka.connect.bigquery.GcpClientBuilderProjectTest
Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.023 s - in com.wepay.kafka.connect.bigquery.config.GcsBucketValidatorTest
Tests run: 33, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.037 s - in com.wepay.kafka.connect.bigquery.config.BigQuerySinkConfigTest
Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.002 s - in com.wepay.kafka.connect.bigquery.config.PartitioningModeValidatorTest
Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.001 s - in com.wepay.kafka.connect.bigquery.config.MultiPropertyValidatorTest
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.006 s - in com.wepay.kafka.connect.bigquery.config.CredentialsValidatorTest
Tests run: 11, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.005 s - in com.wepay.kafka.connect.bigquery.config.StorageWriteApiValidatorTest
Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.002 s - in com.wepay.kafka.connect.bigquery.config.PartitioningTypeValidatorTest
Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.001 s - in com.wepay.kafka.connect.bigquery.utils.PartitionedTableIdTest
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.002 s - in com.wepay.kafka.connect.bigquery.utils.FieldNameSanitizerTest
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.001 s - in com.wepay.kafka.connect.bigquery.ErrantRecordHandlerTest
Tests run: 23, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.179 s - in com.wepay.kafka.connect.bigquery.BigQuerySinkTaskTest
Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.125 s - in com.wepay.kafka.connect.bigquery.BigQueryStorageApiBatchSinkTaskTest
Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.003 s - in com.wepay.kafka.connect.bigquery.BigQuerySinkConnectorTest
Tests run: 12, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.036 s - in com.wepay.kafka.connect.bigquery.MergeQueriesTest
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.106 s - in com.wepay.kafka.connect.bigquery.BigQueryStorageApiSinkTaskTest
Tests run: 17, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.003 s - in com.wepay.kafka.connect.bigquery.exception.BigQueryStorageWriteApiErrorResponsesTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.001 s - in com.wepay.kafka.connect.bigquery.exception.BigQueryErrorResponsesTest
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0 s - in com.wepay.kafka.connect.bigquery.exception.BigQueryStorageWriteApiConnectExceptionTest
Tests run: 39, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.048 s - in com.wepay.kafka.connect.bigquery.SchemaManagerTest
Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.028 s - in com.wepay.kafka.connect.bigquery.GcsToBqLoadRunnableTest
Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.001 s - in com.wepay.kafka.connect.bigquery.convert.KafkaDataConverterTest
Tests run: 42, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.09 s - in com.wepay.kafka.connect.bigquery.convert.BigQuerySchemaConverterTest
Tests run: 37, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.028 s - in com.wepay.kafka.connect.bigquery.convert.BigQueryRecordConverterTest
Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.002 s - in com.wepay.kafka.connect.bigquery.convert.logicaltype.KafkaLogicalConvertersTest
Tests run: 12, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.004 s - in com.wepay.kafka.connect.bigquery.convert.logicaltype.DebeziumLogicalConvertersTest
Results:

Tests run: 342, Failures: 0, Errors: 0, Skipped: 0

------------------------------------------------------------------------
BUILD SUCCESS
------------------------------------------------------------------------
Total time:  6.532 s
Finished at: 2025-08-20T18:26:14+02:00
------------------------------------------------------------------------

@veliuenal veliuenal requested a review from Claudenw August 20, 2025 16:29
Copy link
Contributor

@Claudenw Claudenw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the javadoc as requested and I will approve.

import java.util.Random;

/**
* Simple exponential backoff helper with jitter.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delay doubling backoff, not exponential.

}

/**
* Sleep for the current backoff period and increase the delay exponentially.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment should state that backoff period doubles

Comment on lines +231 to +243
while (!timer.isExpired()) {
try {
return func.get();
} catch (BaseServiceException e) {
retryCount++;
if (e.isRetryable()) {
logger.warn("Retryable exception on attempt {}: {}", retryCount, e.getMessage());
backoff.delay();
} else {
logger.error("Non-retryable exception on attempt {}", retryCount, e);
throw e;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This delay code has a couple of edge cases.

backoff.delay() may delay longer than the timer has left and so there will be a delay in execution that may amount to several seconds before the timer expiration is detected. At the high end of the delay time it may make sense to adjust the backoff to only delay for a period slightly less than the the time remaining or to abort the delay all together.

By default there are 10 retries of 1 second each based on default config options. So timeout is 10 seconds.
The initial backoff delay is 100ms and is multiplied by 2 on each backoff call.

next delay ms remaining ms used
100 10000 0
200 9800 100
400 9600 300
800 9200 700
1600 8400 1500
3200 6800 3100
6400 3600 6300
12800 -2800 12700

On the 7th iteration we have spent 6.3 seconds waiting, there are 3.6 seconds remaining on the timer but the delay is calling for 6.4 seconds of delay

We delay 6.4. seconds, the timer has expired by 2.8 seconds so we exit the loop without retrying.
We could have exited the loop before the 6.4 seconds if we knew the timer would expire and saved 6.4 seconds of execution time.

If the outer while loop were changed to a "do/while" loop, and the backoff delay started with no delay, then you could call backoff.delay() and then func.get(). In this way you only delay if there is still time left after the get call and if you adjust the backoff.delay to account for the time remaining on the timer then you can ensure that your total time is close to the maximum time + jitter.

The original code I pointed you to handles this case. It also applies negative jitter as well as positive jitter so that multiple threads will diverge more rapidly.

Since this is in a private method and can be adjusted easily in the future, I will approve this change.

However, if you decide that you want to make adjustments, I will be happy to accept them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Claude,
Thanks for your detailed comment 🙏
I have two points I'd like to highlight.

First, I completely agree with your statements here. Therefore, I added this line to ensure delay doesn't exceed maxDelayMs. If the retry value is 10, then it can be a maximum of 10 seconds, as you mentioned:

long delay = Math.min(currentDelayMs, maxDelayMs);

On the other hand, the retry count actually comes from the BIGQUERY_RETRY_DEFAULT parameter. At first, I also assumed the value was retrieved from MAX_RETRIES_DEFAULT, but I realized when I was running the unit tests.

The default value of BIGQUERY_RETRY_DEFAULT is 0, and it will only retry only once as we set this condition here:

    if (retries == 0) {
      timeout = Duration.ofMillis(retryWaitMs);
    }

I would really appreciate if you can confirm/reject my understading here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's start with retries = 0. Timeout is then retryWaitMs (default 100, minimum 0)
if retrywait = 0 then executeWithTimout will not execute the command at all.

if retries = 1. Timeout is then retryWaitMs (default 100, minimum 0)
assume first call fails with retryable exception.
thread will wait for 100ms
test at line 231 will fail and a retry will not occur

looking at the default values, retries=10 retrayWaitMs=100 maxdelay = 10000ms
on the 7th interation, using the table above) the currentDelayMs is in the first column (6400).
currentDelayMs is less than maxdelay so the delay time will be 6400ms
the total elapsed time thus far is 6300ms so timer has not expired
system will delay for 6400ms so the total elapsed time will be 12700ms and the timer will expire after 7 retries.

This code neither attempts the number of retries, nor stops at the maxdelay.

The example code calculates the number of retries based on the amount of time remaining before the clock expires. Your code might switch to calculating the retries.

If you calculate int(log2(maxdelay)) and subtract from that int(log2(retrywait)) You will have the number of loops necessary to reach maxdelay. (Assuming retrywait>1)

example:

maxdelay=10000, log2(10000) = 13.288, int(13.288) = 13
retrywait=100, log2(100) = 6.644, int(6.644) = 6

13-6 = 7 loops.

loop delay total delay
1 100 100
2 200 300
3 400 700
4 800 1500
5 1600 3100
6 3200 6300
7 6400 12700

7 loops will have a delay that just exceeds or matches the limit
6 loops will stop 1 short of exceeding the limit.

Given that the actual call that we are attempting to time takes some time, I would think that 6 is the number of loops you actually want to execute, based on the maxdelay and retrywait. values

@veliuenal veliuenal requested a review from Claudenw August 21, 2025 16:17
@veliuenal
Copy link
Contributor Author

Hello @Claudenw ,
Yesterday I updated the details in javadoc as you suggested.
Could you please review the PR?
Thanks.

Copy link
Contributor

@Claudenw Claudenw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will approve with changes as exist, or you can fix the retry,

Comment on lines +231 to +243
while (!timer.isExpired()) {
try {
return func.get();
} catch (BaseServiceException e) {
retryCount++;
if (e.isRetryable()) {
logger.warn("Retryable exception on attempt {}: {}", retryCount, e.getMessage());
backoff.delay();
} else {
logger.error("Non-retryable exception on attempt {}", retryCount, e);
throw e;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's start with retries = 0. Timeout is then retryWaitMs (default 100, minimum 0)
if retrywait = 0 then executeWithTimout will not execute the command at all.

if retries = 1. Timeout is then retryWaitMs (default 100, minimum 0)
assume first call fails with retryable exception.
thread will wait for 100ms
test at line 231 will fail and a retry will not occur

looking at the default values, retries=10 retrayWaitMs=100 maxdelay = 10000ms
on the 7th interation, using the table above) the currentDelayMs is in the first column (6400).
currentDelayMs is less than maxdelay so the delay time will be 6400ms
the total elapsed time thus far is 6300ms so timer has not expired
system will delay for 6400ms so the total elapsed time will be 12700ms and the timer will expire after 7 retries.

This code neither attempts the number of retries, nor stops at the maxdelay.

The example code calculates the number of retries based on the amount of time remaining before the clock expires. Your code might switch to calculating the retries.

If you calculate int(log2(maxdelay)) and subtract from that int(log2(retrywait)) You will have the number of loops necessary to reach maxdelay. (Assuming retrywait>1)

example:

maxdelay=10000, log2(10000) = 13.288, int(13.288) = 13
retrywait=100, log2(100) = 6.644, int(6.644) = 6

13-6 = 7 loops.

loop delay total delay
1 100 100
2 200 300
3 400 700
4 800 1500
5 1600 3100
6 3200 6300
7 6400 12700

7 loops will have a delay that just exceeds or matches the limit
6 loops will stop 1 short of exceeding the limit.

Given that the actual call that we are attempting to time takes some time, I would think that 6 is the number of loops you actually want to execute, based on the maxdelay and retrywait. values

@Claudenw
Copy link
Contributor

@veliuenal please add the proper license headers to your submission

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants