-
Couldn't load subscription status.
- Fork 247
Description
Expected Behavior
I want to update my clusters after a database update (in which I add new sequences but also delete sequences compared to the old database).
The clusterupdate command works, but when I try to convert the cluster database to a tsv file, I have an error message related to the index (see below).
I tried the same thing on a new database where I just added sequences and it worked perfectly, so I assume the problem comes from the fact that I remove sequences from the old database?
Current Behavior
Error when trying to generate the tsv file.
In the cluster database obtained after clusterupdate ('CLU_updated') the removed sequences still appear, but they are absent of the updated sequence database ('DB_updated').
Steps to Reproduce (for bugs)
-
Creation of old DB (oldDB.fa : 17 amino acid sequences)
mmseqs createdb oldDB.fa DB_old -
Clustering of old DB
mmseqs cluster DB_old CLU_old tmp -
Creation of new DB (newDB.fa : 13 sequences are identical with the old DB, 4 were removed, 4 were added)
mmseqs createdb newDB.fa DB_new -
Cluster update
mmseqs clusterupdate DB_old DB_new CLU_old DB_updated CLU_updated tmp
No error there, but even though sequences of numeric identifiers 12 , 11 , 16 , 15 in the old db have been removed, they appear in the CLU_updated file. They do not appear in the DB_updated files. -
Conversion of cluster DB in tsv :
mmseqs createtsv DB_updated DB_updated CLU_updated clusters.tsv
=> Error message, generation of empty files : clusters.tsv.1 ... clusters.tsv.7 and clusters.tsv.index.1 ... clusters.tsv.index.7
MMseqs Output (for bugs)
Program call:
createtsv DB_updated DB_updated CLU_updated clusters.tsv
MMseqs Version: 2f66ae897fc813450fa5ef0c78123bd3c41c4717
first sequence as respresentative false
Target column 1
Add Full Header false
Database Output false
Threads 8
Compressed 0
Verbosity 3
Query database: DB_updated
Touch data file DB_updated_h ... Done.
Result database: CLU_updated
Start writing to clusters.tsv
Invalid database read for database data file=DB_updated_h, database index=DB_updated_h.index
getData: local id (4294967295) >= db size (17)
Context
Providing context helps us come up with a solution and improve our documentation for the future.
Your Environment
- Git commit used: 2f66ae8
- Which MMseqs version was used: Compilation from source
- Cmake versions used: cmake version 3.5.1
- Operating system and version: Ubuntu 16.04 LTS
Thank you in advance for your help :)