Skip to content

Conversation

timesler
Copy link
Contributor

@timesler timesler commented Aug 25, 2025

Description of changes: This PR adds support for pre-built per-instance docker images, hosted on GHCR.

Detail:

  • Adds a script that was used to build and upload the images
  • Updates the evaluation harness get an appropriate image by:
    1. Checking for an existing local image for the instance
    2. Attempting to pull a pre-built image from ghcr.io/timesler/swe-polybench.eval.x86_64.{instance_id}:latest
    3. Fallback to current local docker image build solution
  • No changes have been made to the usage of the harness

The changes to the actual harness code are minimal, and relate only to checking for and using a prebuilt image if it exists.

State of pre-built images:

  • Currently, the first half (approx) of PB500 has been pushed and is available
  • Should have the remaining images ready in the next few days

Testing:

  • I have run the harness with the Gold patches successfully on the first ~150 or so instances in the dataset using prebuilt images, and will complete the evaluation over the next few days as the additional images are push and made available

Backward compatability:
The way this change has been implemented, it will work whether an instance has a pre-built image available or not, so can be used as is today.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@timesler timesler force-pushed the timesler/update-harness-to-use-prebuilt-images branch from 46066a1 to a249946 Compare August 25, 2025 05:02
@timesler timesler force-pushed the timesler/update-harness-to-use-prebuilt-images branch from a249946 to a47889c Compare August 25, 2025 05:03
Copy link
Contributor

@mshihabr mshihabr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for uploading the scripts as well!

@mshihabr mshihabr merged commit 864f077 into amazon-science:main Aug 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants