-
Notifications
You must be signed in to change notification settings - Fork 4k
Description
Background:
Ably realtime is providing a RabbitMQ backed queue service to customers who wish to consume data published into our distributed pub/sub system.
All rate and resource limiting is controlled on the publishing end so generally throttling or rate limiting consumers is not necessary.
Problem:
A customer recently got into a busy loop whereby they were consuming thousands of messages off the queue, then NACK'ing them all, then consuming the queue again, NACK'ing again, and so forth. We noticed an extremely high level of CPU consumption as a result.
Our rate & resource limiting in regards to publishing into queues was ineffective as the message rate they were sustaining was around 5k per second, yet the queue length was remaining constant.
Help with a solution:
I feel that this is arguably a bug, or at least a potential area for a DoS attack, so have thus raised an issue. If you feel this is better suited in StackOverFlow / Google Groups, I am sorry for wasting your time, just say so and I will close this issue.
My feeling is stopping this from happening could be prevented in a number of ways:
- Connection throttling - I don't really like this idea as it's a bit arbitrary
- Max NACKs per message - i.e. update a header on the message with NACK/requeue count and move the message into the dead letter queue once that NACK count is hit. I am not sure if messages are mutable though so not sure this would work.
I am happy to explore building a plugin for this, or doing a PR to add as a feature, although I'd really like your advice on what you think is the best approach to prevent this potential DoS vulnerability.
We may also consider using our routers to help prevent this type of activity, but I don't really like the idea of that if we can help it.