Skip to content

Feature request: split fasta cluster output into separate files #406

@joelb123

Description

@joelb123

Again, this is an easy script, but one might as well ask for what one wants and have it done centrally.

A typical downstream use of the clusters is to do multple-sequence-alignment calculations and then look at some stats on those such as the fraction missing and fraction parsimony-informative. This means making a directory to hold a fasta of each cluster and then running one's favorite MSA/treebuilder algorithm on it (MUSCLE, in my case), then doing some descriptive statistics on them.

It would be nice if optionally mmseqs would do this splitting on its own. Nicer still if it would spawn the MSA/treebuilder with a user-specified set of arguments and do the summary stat calculation with output to TSV. I note that MUSCLE is public-domain and pretty fast.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions