Skip to content

🐛 [storage-resize-images] executions & exponential costs in backfill mo #2506

@TomDUVAL-MAHE

Description

@TomDUVAL-MAHE

We are experiencing a critical incident related to the official Firebase storage-resize-images extension (Google).
We have a function that has been running in backfill mode since July 16 to process our images. We have not made any changes to our extension configuration since July 16. However, starting on July 25, the function began to see its number of calls per minute increase in several steps, with a critical increase on August 8 between 01:00 and 03:00, screens 1–2 show the number of executions per second of the Firebase extension functions (red = image generation, blue = backfill), with a peak at 300 + 160 requests/s sustained for one hour over approximately four days.

Two functions are implicated and operate together:

  • ext-storage-resize-images-backfillResizedImages
  • ext-storage-resize-images-generateResizedImage

🔧 Environment & configuration

  • Region: europe-west1
  • Memory: FUNCTION_MEMORY=2048
  • Backfill: DO_BACKFILL=true
  • Types & sizes:
    IMAGE_TYPE=avif
    IMG_SIZES=640x480,1200x800,1920x1280
    IS_ANIMATED=true
    OUTPUT_OPTIONS={"avif":{"quality":100}}
    SHARP_OPTIONS={"fit":"cover"}
  • Paths:
    RESIZED_IMAGES_PATH=thumbnails
    EXCLUDE_PATH_LIST=/maps/*/thumbnails,/pois/*/thumbnails
  • Other:
    DELETE_ORIGINAL_FILE=false
    MAKE_PUBLIC=true
    CONTENT_FILTER_LEVEL=OFF
    REGENERATE_TOKEN=false
  • Impacted instances:
    ext-storage-resize-images-backfillResizedImages
    ext-storage-resize-images-generateResizedImage
  • Dataset: ~600k images (JPG/PNG) → multi-size AVIF generation (~200 GB)

Important: We did not modify the extension’s code, only its parameters via the .env file. We have not yet migrated to the recently published version, exposing additional concurrency/batch parameters.

⏱️ Timeline & metrics
IC = Instance Count
I/s = Instances per second
Jul 20 → 24: IC 0,015k → 0,017k ; I/s 1.5 → 2
Jul 25 → 29: IC 0,038k → 0,073k ; I/s 3.45 → 5
Jul 30 → Aug 7: IC 0,160k → 0,737k ; I/s 9 → 60
Aug 8 (00:00) → Aug 14 (shutdown): IC 0,748k → 1,700/1,800k ; I/s 59 → 150
➡️ Manual shutdown by our team on Aug 14 ~17:06 (local time)

🔍 Technical findings

  • Logs (extracts available) indicate the extension scans the entire bucket, including thumbnails despite EXCLUDE_PATH_LIST.
  • Thumbnails appear to be excluded from processing, but not from enumeration (list/scan).
  • This multiplies I/O operations as new thumbnails are created.
  • Hypothesis: combinatorial explosion of listing operations across successive thumbnail generations → exponential execution growth, cost drift, and runaway behavior (night-time spikes, without any deployment or user traffic correlation).
  • No explicit execution errors are visible in the logs (on a necessarily limited sample; several billions of lines in total).

✅ Actions already taken (client side)

  • Immediate shutdown of both functions on Aug 14
  • Freeze on extension-related deployments until the root cause is clarified

💸 Financial impact

  • Our usage costs have spiked dramatically.

🚨 Status
This incident is blocking us as a client (costs, trust, service continuity).

Image Image

Metadata

Metadata

Assignees

Labels

needs: author feedbackPending additional information from the authortype: bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions