Centrifuge custom index

Posted in Upload your private data by Rosario Brancaccio Thu Jan 02 2020 13:55:18 GMT+0000 (Coordinated Universal Time)·1·Viewed 392 times

I've used centrifuge locally to generate an index based on the GenBank database. The output of this operation is 4 files with the *.cf extension. once I try to use them to run a centrifuge run on the cloud it asks for the index in tar.gz format. what does it mean? Then I'm trying to run the indexing on the cloud but I've found only a pipe to use RefSeq but not GenBank and in general, I would like to do it locally to have more freedom. now I'm running this script on the cloud: "Reference Index Creation " to create an index based on RefSeq and have the output in tar format but it is not exactly what I want to do. there is a way to have the index in tar format with centrifuge locally or to use genbank database in this cloud app to generate the index?
Jan 13, 2020

Hi Rosario,

Here's the suggested solution by our Bioinformatics Team:

The database from which the sequences can be downloaded in the Reference Index Creation workflow is fixed to RefSeq. However, you can use Centrifuge Download tool to download sequences from GenBank database, as you already did locally, and then use the Centrifuge Build tool to build the index.

The problem you are facing when using this tool is caused by providing the output of the Centrifuge Build tool, that you ran locally, directly to the Centrifuge tool on the cloud. The Centrifuge tool on the CGC takes the centrifuge index in tar.gz and tar format.

So, there are two options:

  1. You can run Centrifuge Download and Centrifuge Build tools locally and then compress the output files (tar -cf basename-string-value.tar basename-string-value.1.cf basename-string-value.2.cf basename-string-value.3.cf) and upload basename-string-value.tar to the CGC, and then run the Centrifuge Classifier tool with that file.
  2. You can do all the work on the CGC platform. First use Centrifuge Download and Centrifuge Build tools to build the index, but now you don’t have to worry about the reference index format, since the output of the Centrifuge Build tool is already in tar format (the tool is wrapped that way). After that, you can just provide that index file to the Centrifuge Classifier tool.

Best,
Marko

  
Markdown is allowed