-
Notifications
You must be signed in to change notification settings - Fork 736
Closed
Labels
Description
Bug report
Expected behavior and actual behavior
When using AWS Batch executor and running a single Nextflow step that needs to stage thousands of files from S3, the execution of the bash script command.run
gives a segmentation fault error.
This is most probably due to the fact that all the names and paths of the files that need to be staged on the AWS Batch job container are put in a single bash array for the execution of multiple download function. The process then, once executed, goes out of the bash stack size limit.
Steps to reproduce the problem
mkdir many_files_test && cd many_files_test
for ((n=0;n<100000;n++)); do touch dummy_file_${n}.txt; done
aws s3 sync ./ s3://bucket/tmp-dir/
Then simply run a Nextflow process that retrieves all the files using a
Channel.fromPath("s3://bucket/tmp-dir/dummy_file_*").collect()
Program output
nxf-scratch-dir ip-10-0-2-243:/tmp/nxf.naKegDjemO
bash: line 1: Done aws --region eu-west-1 s3 cp --only-show-errors s3://bucket/nextflow-tmp/63/00e73447d2269f625d32e1b1d8a081/.command.run -
Segmentation fault | bash 2>&1
| tee .command.log
Environment
- Nextflow version: 19.07
- Java version: 11
- Operating system: macOS Mojave
- Executor: AWS Batch
Additional context
Setting ulimit -s unlimited
in the .command.run
script has proven to be a workaround solution for this problem.