-
Notifications
You must be signed in to change notification settings - Fork 290
Closed
Description
Problem Description
When attempting to copy or move large objects (e.g., 5GB) using the cp or mv operations with s3fs (via fsspec), the operation silently fails or produces an error. This behavior is inconsistent and problematic for workflows requiring large file handling in AWS S3.
Steps to Reproduce
Below is a minimal reproducible example to demonstrate the issue:
import os
from upath import UPath
def create_large_file(file_path, size_gb=5):
"""Creates a large file of specified size locally."""
with open(file_path, "wb") as f:
chunk_size = 1024 * 1024 # 1MB
total_chunks = size_gb * 1024 # Total number of chunks to create the file
for _ in range(total_chunks):
f.write(os.urandom(chunk_size)) # Write random bytes to the file
print(f"Created large file of size {size_gb}GB at {file_path}")
# Step 1: Create a large local file (e.g., 5GB)
local_file = "large_file_5gb.dat"
create_large_file(local_file, size_gb=5)
# Step 2: Define source and target paths using UPath with s3fs
source = UPath("s3://MY_BUCKET/source/large_file_5gb.arrow")
target = UPath("s3://MY_BUCKET/target/large_file_5gb.arrow")
# Step 3: Upload the local file to S3
source.write_bytes(UPath(local_file).read_bytes()) # This works for small files and large files alike.
# Step 4: Attempt to copy the large file within S3
try:
source.fs.cp(str(source), str(target), recursive=True)
print("Copy operation succeeded.")
except Exception as e:
print(f"Copy operation failed: {e}")
Expected Behavior
The cp operation should successfully copy the file from the source path to the target path in S3, regardless of the file size.
Observed Behavior
For large files (e.g., 5GB or more), the cp or mv operation fails with error: Read Timeout on endpoint URL: https://THE_OBJ_URL_S3
Metadata
Metadata
Assignees
Labels
No labels