Skip to content

Looking up number of active runners can be slow and cause API rate limit #4710

@npalm

Description

@npalm

Problem

For a setup supporting a large amount of runners, about 20K+ (a day) the method the scale up function is suing to decide the nmaximum number of runner is reached can be slow.

The current implementation is looking for active and pending ec2 istnaces with relevant tags using the DescribeIntance API. For large enivorments we see that this method can become slow and even cause RateLimits on the EC2 API.

async function getRunners(ec2Filters: Ec2Filter[]): Promise<Runners.RunnerList[]> {
const ec2 = getTracedAWSV3Client(new EC2Client({ region: process.env.AWS_REGION }));
const runners: Runners.RunnerList[] = [];
let nextToken;
let hasNext = true;
while (hasNext) {
const instances: DescribeInstancesResult = await ec2.send(
new DescribeInstancesCommand({ Filters: ec2Filters, NextToken: nextToken }),
);
hasNext = instances.NextToken ? true : false;
nextToken = instances.NextToken;
runners.push(...getRunnerInfo(instances));
}
return runners;
}

This method was chosen back in the days to keep the complexity low. And avoid managing any state. But negative side effect could be result in a lambda requiring over 15 seconds more to just see if the limit is reached. And also can casue a API rate limit.

Solutions

We looking for a slotion that keeps the complexity low but have not the negative side effects as mention above. Here a first short with ideas:

  • Cache the result in the lambda for a certain time frame, this will avoid that under high load the call will be made frequent. Drawback.. The module can scale beyond the set maximum
  • Try to keep track on the number of instances by listining to events for creation and termination. But this requires a complex setup.
  • ...

Metadata

Metadata

Assignees

No one assigned

    Labels

    help wantedExtra attention is needed

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions