-
Notifications
You must be signed in to change notification settings - Fork 549
Description
When a rasterio reads from S3 and it hits an s3 throttle limit (503), some code to wrap the reader in a retry block fails to retry (sample function code below). It appears as though rasterio/GDAL has registered the dataset as some kind of missing dataset (s3 object) and will refuse to retry reading it.
2021-02-18 08:03:18,849 | INFO | raster_file_metadata:1671 | Reading GeoTIFF metadata: s3://tst-research-dem/n40w085.tif
[WARNING] 2021-02-18T08:03:18.935Z 6b4780eb-93f1-4226-a7c0-8564329c21c5 CPLE_AppDefined in HTTP response code on https://tst-research-dem.s3.amazonaws.com/n40w085.tif: 503
2021-02-18 08:03:18,998 | ERROR | raster_file_metadata:1681 | Please reduce your request rate.
# first retry:
2021-02-18 08:03:19,720 | ERROR | raster_file_metadata:1681 | '/vsis3/tst-research-dem/n40w085.tif' does not exist in the file system, and is not recognized as a supported dataset name.
# second retry:
2021-02-18 08:03:21,032 | ERROR | raster_file_metadata:1681 | '/vsis3/tst-research-dem/n40w085.tif' not recognized as a supported file format.
Sample function to read/retry a COG metadata:
# assume the geotiff arg is "s3://tst-research-dem/n40w085.tif"
def raster_file_metadata(geotiff: str) -> Optional[Dict]:
LOGGER.info("Reading GeoTIFF metadata: %s", geotiff)
with rasterio.Env():
retry_jitter = random.uniform(0.1, 0.3)
tries = 0
while tries < 3:
tries += 1
try:
with rasterio.open(geotiff) as src:
# raster_metadata function only uses src.* metadata methods
return raster_metadata(src)
except rasterio.errors.RasterioIOError as err:
LOGGER.error(err)
time.sleep(RETRY_DELAY + retry_jitter)
Are there better ways to use rasterio to retry any failed reads for s3-COG metadata and data? Are there any values for the rasterio.Env()
or other configuration details that will avoid a failure to retry reading an s3-COG when the first read hits an s3-throttle error (503)? Is this a known feature of rasterio/GDAL or is this a bug?
Versions are binary wheel installations (pip only with rasterio bundled libs for GDAL) and it runs on AWS lambda runtime containers for python 3.7
$ poetry show rasterio
name : rasterio
version : 1.2.0
description : Fast and direct raster I/O for use with Numpy and SciPy
dependencies
- affine *
- attrs *
- certifi *
- click >=4.0,<8
- click-plugins *
- cligj >=0.5
- numpy *
- snuggs >=1.4.1
$ python
Python 3.7.9 | packaged by conda-forge | (default, Dec 9 2020, 21:08:20)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import rasterio
>>> rasterio.gdal_version()
'3.2.1'