Skip to content

Bash crashes on execution of command.run script when a large number of files need to be staged [AWS Batch] #1364

@fstrozzi

Description

@fstrozzi

Bug report

Expected behavior and actual behavior

When using AWS Batch executor and running a single Nextflow step that needs to stage thousands of files from S3, the execution of the bash script command.run gives a segmentation fault error.

This is most probably due to the fact that all the names and paths of the files that need to be staged on the AWS Batch job container are put in a single bash array for the execution of multiple download function. The process then, once executed, goes out of the bash stack size limit.

Steps to reproduce the problem

mkdir many_files_test && cd many_files_test
for ((n=0;n<100000;n++)); do touch dummy_file_${n}.txt; done
aws s3 sync ./ s3://bucket/tmp-dir/

Then simply run a Nextflow process that retrieves all the files using a

Channel.fromPath("s3://bucket/tmp-dir/dummy_file_*").collect()

Program output

nxf-scratch-dir ip-10-0-2-243:/tmp/nxf.naKegDjemO
bash: line 1:    Done  aws --region eu-west-1 s3 cp --only-show-errors s3://bucket/nextflow-tmp/63/00e73447d2269f625d32e1b1d8a081/.command.run -
Segmentation fault      | bash 2>&1
| tee .command.log

Environment

  • Nextflow version: 19.07
  • Java version: 11
  • Operating system: macOS Mojave
  • Executor: AWS Batch

Additional context

Setting ulimit -s unlimited in the .command.run script has proven to be a workaround solution for this problem.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions