Skip to content

Inconsistent behaviour for glob process outputs #2425

@cjw85

Description

@cjw85

Bug report

Expected behavior and actual behavior

According to, https://www.nextflow.io/docs/latest/process.html#multiple-output-files, a glob pattern can be used to emit a list of multiple items from a process into an output channel. What is not clear from the documentation is that if only a single file matches the glob, a length-1 list is not emitted but rather the plain value.

Returning different types from a function (process in this instance) is typically considered bad practice and burdens the caller with having to perform introspection. In the context of of Nextflow this means any operator on a channel first needs to check the returned type, possibly wrap the item thats assumed to be a list, and then safely perform operations such as mapping over the list.

Steps to reproduce the problem

nextflow.enable.dsl=2

process touchOne {
    input:
        val name
    output:
        path "item*"
    script:
        """
        echo "Hello $name"
        touch item1
        """
}

process touchTwo {
    input:
        val name
    output:
        path "item*"
    script:
        """
        echo "Hello $name"
        touch item1
        touch item2
        """
}

workflow {
    touchOne(1).view()
    touchTwo(2).view()
}

Program output

(Copy and paste here output produced by the failing execution. Please highlight it as a code block. Whenever possible upload the .nextflow.log file.)

Environment

  • Nextflow version: version 20.10.0 build 5430
  • Java version: OpenJDK Runtime Environment (build 1.8.0_282-8u282-b08-0ubuntu1~16.04-b08)
  • Operating system: Linux
  • Bash version: GNU bash, version 4.3.48(1)-release

Additional context

I can see there being an argument about making a breaking change to Nextflow by enforcing that all globs return a list. Despite Nextflow's eschewing of heavy pattern-matching of file artifacts, I suspect the globbing is obused a lot to emit single files such that a change to emitting lists will break much exisiting code.

However, I do think that at least optionally globbing outputs should be forceable to be lists. I suspect theres some corner cases around what a lenth-0 list might mean and how thats handled. Minimally the documentation needs updating to highlight that the returned type is not always a list.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions