Skip to content

Conversation

@cserwen
Copy link
Member

@cserwen cserwen commented Jan 10, 2023

Make sure set the target branch to develop

What is the purpose of the change

fix #5838

Brief changelog

XX

Verifying this change

XXXX

Follow this checklist to help us incorporate your contribution quickly and easily. Notice, it would be helpful if you could finish the following 5 checklist(the last one is not necessary)before request the community to review your PR.

  • Make sure there is a Github issue filed for the change (usually before you start working on it). Trivial changes like typos do not require a Github issue. Your pull request should address just this issue, without pulling in other changes - one PR resolves one issue.
  • Format the pull request title like [ISSUE #123] Fix UnknownException when host config not exist. Each commit in the pull request should have a meaningful subject line and body.
  • Write a pull request description that is detailed enough to understand what the pull request does, how, and why.
  • Write necessary unit-test(over 80% coverage) to verify your logic correction, more mock a little better when cross module dependency exist. If the new feature or significant change is committed, please remember to add integration-test in test module.
  • Run mvn -B clean apache-rat:check findbugs:findbugs checkstyle:checkstyle to make sure basic checks pass. Run mvn clean install -DskipITs to make sure unit-test pass. Run mvn clean test-compile failsafe:integration-test to make sure integration-test pass.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

@cserwen cserwen requested review from RongtongJin and ni-ze January 12, 2023 03:22
@lizhimins
Copy link
Member

System busy means server is overload, why we need retry send message?

@cserwen
Copy link
Member Author

cserwen commented Jan 13, 2023

System busy means server is overload, why we need retry send message?

@lizhimins refer to #5838

@wz2cool
Copy link
Contributor

wz2cool commented Jan 18, 2023

@cserwen
we have lot of disussion about it, plz see #2726 , I think you can add SYSTEM_BUSY using addRetryResponseCode method by yourself . you can see my changes #2729
image

@cserwen
Copy link
Member Author

cserwen commented Jan 18, 2023

@cserwen
we have lot of disussion about it, plz see #2726 , I think you can add SYSTEM_BUSY using addRetryResponseCode method by yourself . you can see my changes #2729
image

@wz2cool The sdk's built-in retry is necessary. System busy is a very common situation. If this situation needs to be specified by the user, then our sdk seems difficult to use. In addition, we can easily implement it, why require users to add code by themselves

@wz2cool
Copy link
Contributor

wz2cool commented Jan 18, 2023

@cserwen
we have lot of disussion about it, plz see #2726 , I think you can add SYSTEM_BUSY using addRetryResponseCode method by yourself . you can see my changes #2729
image

@wz2cool The sdk's built-in retry is necessary. System busy is a very common situation. If this situation needs to be specified by the user, then our sdk seems difficult to use. In addition, we can easily implement it, why require users to add code by themselves

some rocketmq designers/members/contributors worried about overload, so sdk's build-in retry PR can't be merged long time ago. I think it's ok to use sdk's built in retry if there is no any risk to do this.

@cserwen
Copy link
Member Author

cserwen commented Jan 30, 2023

@cserwen
we have lot of disussion about it, plz see #2726 , I think you can add SYSTEM_BUSY using addRetryResponseCode method by yourself . you can see my changes #2729
image

@wz2cool The sdk's built-in retry is necessary. System busy is a very common situation. If this situation needs to be specified by the user, then our sdk seems difficult to use. In addition, we can easily implement it, why require users to add code by themselves

some rocketmq designers/members/contributors worried about overload, so sdk's build-in retry PR can't be merged long time ago. I think it's ok to use sdk's built in retry if there is no any risk to do this.

Actually,sdk will also retry when broker returns SYSTEM_ERROR. And when the pagecache is busy, the response code is SYSTEM_ERROR. In our internal, RocketMQ has been running for more than two years, and the entire cluster has not been overloaded because of a single Broker's busy, so this worry is unnecessary. Besides, the commercial version also supports retrying in this case, why can't the open-source version support it?

@wz2cool
Copy link
Contributor

wz2cool commented Jan 30, 2023

@cserwen
we have lot of disussion about it, plz see #2726 , I think you can add SYSTEM_BUSY using addRetryResponseCode method by yourself . you can see my changes #2729
image

@wz2cool The sdk's built-in retry is necessary. System busy is a very common situation. If this situation needs to be specified by the user, then our sdk seems difficult to use. In addition, we can easily implement it, why require users to add code by themselves

some rocketmq designers/members/contributors worried about overload, so sdk's build-in retry PR can't be merged long time ago. I think it's ok to use sdk's built in retry if there is no any risk to do this.

Actually,sdk will also retry when broker returns SYSTEM_ERROR. And when the pagecache is busy, the response code is SYSTEM_ERROR. In our internal, RocketMQ has been running for more than two years, and the entire cluster has not been overloaded because of a single Broker's busy, so this worry is unnecessary. Besides, the commercial version also supports retrying in this case, why can't the open-source version support it?

In some cases, users don't run multi rocketmqs as cluster , they just run one rocketmq/nameserver (considering the cost) , I think commercial version is also a cluster.
I support to use build-in retrying if there is no risk to do this. Could someone make some tests to prove it.
Currently, I think config is a good choice.

@github-actions
Copy link

This PR is stale because it has been open for 365 days with no activity. It will be closed in 3 days if no further activity occurs. If you wish not to mark it as stale, please leave a comment in this PR.

@github-actions github-actions bot added the stale label Jan 31, 2024
@github-actions
Copy link

github-actions bot commented Feb 3, 2024

This PR was closed because it has been inactive for 3 days since being marked as stale.

yuz10
yuz10 previously approved these changes May 8, 2024
@cserwen cserwen force-pushed the retry_for_system_busy branch from 6af9de5 to 8c60f7f Compare May 8, 2024 03:13
@cserwen cserwen requested a review from yuz10 May 8, 2024 06:00
@codecov-commenter
Copy link

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 42.85%. Comparing base (dcf892a) to head (8c60f7f).
Report is 6 commits behind head on develop.

Additional details and impacted files
@@              Coverage Diff              @@
##             develop    #5845      +/-   ##
=============================================
- Coverage      42.93%   42.85%   -0.09%     
+ Complexity     10381    10356      -25     
=============================================
  Files           1270     1270              
  Lines          88690    88691       +1     
  Branches       11399    11399              
=============================================
- Hits           38078    38006      -72     
- Misses         45924    45987      +63     
- Partials        4688     4698      +10     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@cserwen cserwen requested review from humkum and wz2cool and removed request for ShadowySpirits May 8, 2024 09:42
@cserwen cserwen requested a review from ShadowySpirits May 9, 2024 06:32
@cserwen cserwen merged commit e5b357c into apache:develop May 10, 2024
@cserwen cserwen deleted the retry_for_system_busy branch May 10, 2024 03:19
Copy link
Contributor

@wz2cool wz2cool left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good

@zhbdesign
Copy link

zhbdesign commented Aug 29, 2025

A busy exception occurred while sending the message, and the exception information is as follows: com.alibaba.rocketmq.client.exception.MQBrokerException: CODE: 2 DESC: [TIMEOUT_CLEAN_QUEUE]broker busy, start flow control for a while, period in queue: 208ms, size of queue: 8, Without any retry attempts, the client is configured by default. Sometimes this exception occurs and the message is sent out, but I cannot get the sendresult result. Is this a normal phenomenon? I see that the code usually does not send the message successfully when this exception occurs. My version has not yet added SYSTEM_BUSY to automatic retry

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Client will not retry to send message when broker busy

9 participants