Skip to content

cp/mv Operations Fail for Large Objects with s3fs and fsspec #916

@b23g5r42i

Description

@b23g5r42i

Problem Description

When attempting to copy or move large objects (e.g., 5GB) using the cp or mv operations with s3fs (via fsspec), the operation silently fails or produces an error. This behavior is inconsistent and problematic for workflows requiring large file handling in AWS S3.

Steps to Reproduce

Below is a minimal reproducible example to demonstrate the issue:

import os
from upath import UPath

def create_large_file(file_path, size_gb=5):
    """Creates a large file of specified size locally."""
    with open(file_path, "wb") as f:
        chunk_size = 1024 * 1024  # 1MB
        total_chunks = size_gb * 1024  # Total number of chunks to create the file
        for _ in range(total_chunks):
            f.write(os.urandom(chunk_size))  # Write random bytes to the file
    print(f"Created large file of size {size_gb}GB at {file_path}")

# Step 1: Create a large local file (e.g., 5GB)
local_file = "large_file_5gb.dat"
create_large_file(local_file, size_gb=5)

# Step 2: Define source and target paths using UPath with s3fs
source = UPath("s3://MY_BUCKET/source/large_file_5gb.arrow")
target = UPath("s3://MY_BUCKET/target/large_file_5gb.arrow")

# Step 3: Upload the local file to S3
source.write_bytes(UPath(local_file).read_bytes())  # This works for small files and large files alike.

# Step 4: Attempt to copy the large file within S3
try:
    source.fs.cp(str(source), str(target), recursive=True)
    print("Copy operation succeeded.")
except Exception as e:
    print(f"Copy operation failed: {e}")

Expected Behavior

The cp operation should successfully copy the file from the source path to the target path in S3, regardless of the file size.

Observed Behavior

For large files (e.g., 5GB or more), the cp or mv operation fails with error: Read Timeout on endpoint URL: https://THE_OBJ_URL_S3

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions