nanoporeReads_dataTransfer

A pipeline to process Nanopore reads and transfer the results to the end users.

Installation

git clone [email protected]:maxplanck-ie/nanoporeReads_dataTransfer.git
cd nanoporeReads_dataTransfer
pip install .

Note that the workflow requires conda to function, as some rules run in their own conda environments.

Implementation

The key functionality is achieved using snakemake workflows. From version 2.0.0 two different snakemake rule sets are supported which are centered around two different basecallers:

rules_dorado: a dorado-based workflow.

A wrapper python script (ont.py) implements

the continuous screening of the source directory,
the generation of a flowcell-specific configuration file, and
the communication with enduser (emails etc.)

Configurations

The main configuration file (config.yaml) specifies:

the paths for the rule set be used (rulesPath: rules or rules_dorado),
the overall directory structure (see below)
organism-specific paths (e.g. genome and transcriptome locations)
communication settings (email, Parkour LIMS, sambahost)
generic parameters (basecalling, mapping)

Notice that the generic configuration defined by this file is expanded by project-specific entries for each incoming flowcell

Additional configuration files are:

env.yaml (for conda installation of all dependencies)
multiqc_config.yaml (to customize multiqc output)

Usage

ont -c config.yaml

Directory structures

The workflow connects and relies on three main data locations:

A source directory (offloadDir) is screened for the arrival of new and unprocessed flowcells
A work directory (outputDir) is used for various processing steps (merging, basecalling, demultiplexing, alignment, quality controls)
The target directory (groupDir) receives the analysis results in a project-wise manner.

The details are rule-set dependent. Annotated examples for rules_dorado is given below

Example input path (`offloadDir`)

This directory is generated by the sequencing machine and may change in response to technological developments.

../path/to/flowcell/
.
├── bam_pass            # from fast basecalling
├── barcode_alignment_PAS33554_6b0029ab_a0fbcf5b.tsv
├── fastq_pass          # from fast basecalling
├── final_summary_PAS33554_6b0029ab_a0fbcf5b.txt
├── other_reports
├── pod5_pass           # pod5 format
├── pore_activity_PAS33554_6b0029ab_a0fbcf5b.csv
├── report_PAS33554_20230928_1016_6b0029ab.html
├── report_PAS33554_20230928_1016_6b0029ab.json
├── report_PAS33554_20230928_1016_6b0029ab.md
├── SampleSheet.csv     # sample sheet information
├── sample_sheet_PAS33554_20230928_1016_6b0029ab.csv
├── sequencing_summary_PAS33554_6b0029ab_a0fbcf5b.txt
└── throughput_PAS33554_6b0029ab_a0fbcf5b.csv

Example output path during processing (`outputDir`)

../path/to/flowcell
.
├── analysis.done            # flag to signal that this folowcell has been fully processed
├── bam                      # output from basecalling in bam format (including modificaytion calls)
├── bam_demux                # demulitplex samples (empty if no barcoding)
├── benchmarks               # benchmarks for each rule
├── benchmarks_combined.tsv  # combined benchmark file
├── flags                    # directory with flags from snakemake rules
├── log                      # log files (rule-specific)
├── pipeline_config.yaml     # configfile (snakemake & more)
├── pod5                     # directory with merged pod5 file (from offloadDir)
├── reports                  # directory with reports and SampleSheet.csv (from offloadDir)
├── summary                  # summary files (DAG, disk status)
└── transfer                 # analysis output that will be transferred)

transfer/
└── Project_projectID_User_Group
    ├── Analysis_mouse_dna                    # analysis directory (exists only if genome is known)
    │   ├── 23L000329_WT_rep1.align.bam       # alignment
    │   ├── 23L000329_WT_rep1.align.bam.bai   # index
    │   └── 23L000329_WT_rep1.align.bed.gz    # modification calls
    ├── Data
    │   ├── 23L000329_WT_rep1.bam             # basecalled sequences
    │   ├── 23L000329_WT_rep1.fastq.gz        # basecalled sequences (fastq - deprecated)
    │   ├── 23L000329_WT_rep1_porechop.fastq.gz # adaptors, barcodes trimmed
    │   └── 23L000329_WT_rep1.seqsum            # sequencing summaries (for pycoQC etc )
    └── QC
        ├── multiqc
        │   ├── multiqc_data
        │   └── multiqc_report.html            # multiqc report
        ├── sample_names.tsv                   # dictionary sampleID-sampleName
        └── Samples                            # samples-wise quality controls
            ├── 23L000329_WT_rep1.align.flagstat
            ├── 23L000329_WT_rep1.align_pycoqc.html
            ├── 23L000329_WT_rep1.align_pycoqc.json
            ├── 23L000329_WT_rep1_fastqc.html
            ├── 23L000329_WT_rep1_fastqc.zip
            ├── 23L000329_WT_rep1_kraken.report
            ├── 23L000329_WT_rep1_porechop.info
            ├── 23L000329_WT_rep1_pycoqc.html
            ├── 23L000329_WT_rep1_pycoqc.json
            ├── all_porechop.best_end
            ├── all_porechop.best_start
            └── all_porechop.trimmed

Example output path for an end user (`groupDir`)

../user_path/to/flowcell/  (identical to outputDir/transfer)
.
├── metadata.yaml
└── Project_projectID_User_Group
    ├── Analysis_mouse_dna
    ├── Data
    └── QC

Name		Name	Last commit message	Last commit date
Latest commit History 618 Commits
.github		.github
data		data
misc		misc
src/npr		src/npr
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
config.template.yaml		config.template.yaml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

nanoporeReads_dataTransfer

Installation

Implementation

Configurations

Usage

Directory structures

Example input path (`offloadDir`)

Example output path during processing (`outputDir`)

Example output path for an end user (`groupDir`)

About

Uh oh!

Releases 12

Packages

Uh oh!

Contributors 11

Uh oh!

Languages

License

maxplanck-ie/nanoporeReads_dataTransfer

Folders and files

Latest commit

History

Repository files navigation

nanoporeReads_dataTransfer

Installation

Implementation

Configurations

Usage

Directory structures

Example input path (offloadDir)

Example output path during processing (outputDir)

Example output path for an end user (groupDir)

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 12

Packages 0

Uh oh!

Contributors 11

Uh oh!

Languages

Example input path (`offloadDir`)

Example output path during processing (`outputDir`)

Example output path for an end user (`groupDir`)

Packages