generated from sourmash-bio/sourmash_plugin_template
-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Description
When the lineage within the taxonomy file is devoid of any information we get the output from Line 241.
I solved this problem by updating the taxdump names and nodes from NCBI, but should we allow skipping unclassified genomes or group them in some other way?
code:
== This is sourmash version 4.9.0. ==
== Please cite Irber et. al (2024), doi:10.21105/joss.06830. ==
loading taxonomies from ['/group/ctbrowngrp4/2024-ccbaumler-genbank/genbank-20250806/lineages.protozoa.cs
v']
found 2424 identifiers in taxdb.
selecting sketches: k=21 scaled=1000 moltype=DNA
loading sketches from file /group/ctbrowngrp4/2024-ccbaumler-genbank/genbank-20250806/genbank-20250806-protozoa-k21.zip
cannot find ident GCA_051400955 in the provided taxonomy ifle.
The three closest matches to GCA_051400955 are:
* 'GCA_015146095.1'
* 'GCA_002140095.1'
* 'GCA_964014055.1'
No taxonomy information in the lineage file.
/group/ctbrowngrp4/2024-ccbaumler-genbank/genbank-20250806$ grep 051400955 lineages.protozoa.csv
GCA_051400955.1,3042617,,,,,,,,
/group/ctbrowngrp4/2024-ccbaumler-genbank/genbank-20250806$ grep 051624525 lineages.fungi.csv
GCA_051624525.1,3075319,,,,,,,,
/group/ctbrowngrp4/2024-ccbaumler-genbank/genbank-20250806$ grep 050924785 lineages.viral.csv
GCA_050924785.1,2851401,,,,,,,,,,,,,,,,
/group/ctbrowngrp4/2024-ccbaumler-genbank/genbank-20250806$ grep 050886295 lineages.archaea.csv
GCA_050886295.1,3025951,,,,,,,,
They all contain taxonomy information from NCBI...
https://www.ncbi.nlm.nih.gov/datasets/genome/GCA_051400955.1/
https://www.ncbi.nlm.nih.gov/datasets/genome/GCA_051624525.1/
https://www.ncbi.nlm.nih.gov/datasets/genome/GCA_050924785.1/
Using version 0.3.1
$ sourmash info -v
== This is sourmash version 4.9.0. ==
== Please cite Irber et. al (2024), doi:10.21105/joss.06830. ==
sourmash version 4.9.0
- loaded from path: /home/baumlerc/miniforge3/envs/pangenomes/lib/python3.13/site-packages/sourmash/cli
khmer version: None (internal Nodegraph)
screed version 1.1.3
- loaded from path: /home/baumlerc/miniforge3/envs/pangenomes/lib/python3.13/site-packages/screed
the following plugins are installed:
plugin type from python module v entry point name
-------------------- ------------------------------ ----- --------------------
sourmash.cli_script sourmash_plugin_pangenomics 0.3.1 classify_command
sourmash.cli_script sourmash_plugin_pangenomics 0.3.1 createdb_command
sourmash.cli_script sourmash_plugin_pangenomics 0.3.1 merge_command
sourmash.cli_script sourmash_plugin_pangenomics 0.3.1 ranktable_command
Metadata
Metadata
Assignees
Labels
No labels