How to install and use super-focus on deepthought
super-FOCUS
Super-FOCUS is a tool to identify the functions that the sequences in your metagenome are doing. As a benefit, it also identifies the taxonomy associated with those functions.
Super-FOCUS uses a reduced, optimized, database and fast aligners to run.
To run Super-FOCUS you need two things:
- Your data, probably in fasta or fastq format
- A database of things we know about. Note that this is a databse in super-FOCUS format, not in another format!
Installing Super-FOCUS on deepthought
You only need to do this step once! When you come back to deepthought you can do conda activate bioinformatics
and you have super-FOCUS ready to go!
To install Super-FOCUS on deepthought we are going to use conda. Once you have conda installed, [here are more install instructions].
We start by activating our conda environment:
conda activate bioinformatics
(Note, if you great the warning: Could not find conda environment: bioinformatics
, then you need to use conda create -n bioinformatics
first to create the environment.)
and now we can install new software:
conda install -y -c bioconda super-focus
This will figure out all the things that need to be installed, and then install them for you. It should not take too long for the installer to complete.
Next, we need to get the databases. You can use the superfocus_downloadDB
command to download them, you can download them indvidually, or if you are on deepthought you can use Rob’s version of the databases. Just set up a variable to access them like this:
If you are taking Rob’s class you can do this:
conda env config vars set SUPERFOCUS_DB=/home/edwa0468/superfocus_db/version1
NOTE: Run this command to test your diamond version:
diamond --version
If you have version v0.9.0 to v0.9.18 you need to use version 1 of the database If you have version v0.9.19 to v0.9.24 you need to use version 2 of the database If you have v0.9.25 onwards you need to use version 3 of the database
You can change the config setting above like this:
conda env config vars set SUPERFOCUS_DB=/home/edwa0468/superfocus_db/version1
conda env config vars set SUPERFOCUS_DB=/home/edwa0468/superfocus_db/version2
conda env config vars set SUPERFOCUS_DB=/home/edwa0468/superfocus_db/version3
Or you can download the databases here: diamond v1 diamond v2 diamond v3
If you use
mmseqs
there is only one version of the database: mmseqs
The installation is complete and now you can use it to explore your metagenomes. Next time, you can skip this section.
Using super-FOCUS on deepthought
Using Super-FOCUS in class
You can set a variable with the name of the directory:
export FQDIR=BTEC_Water
and then copy Rob’s slurm file:
cp ~edwa0468/superfocus.slurm .
and then submit that to the cluster:
sbatch superfocus.slurm
This will create a new directory called BTEC_Water.superfocus_results
Before you begin
Super-FOCUS requires that your input fastq files be in a directory. We’re going to make a directory called fastq
and copy the data into there. Change barcode_01.fastq
to the name of your fastq file!!
mkdir fastq
cp barcode_01.fastq fastq
(remember, the ls
command will show you the files and directories that you have)
Also note:
super-focus
takes quite a lot of computing resources so we are just going to run this straight on the cluster. It is bad practice to run it on the login node!
Running super-FOCUS
To run superfocus directly
Use this command
superfocus -q fastq -b ~/superfocus_db -a diamond -dir superfocus_results
To run on the cluster during class
cp ~edwa0468/superfocus.slurm ~/
sbatch ~/superfocus.slurm
We are going to make a batch file to submit the command to the cluster. Lets call our file superfocus.slurm
:
nano superfocus.slurm
And then put these lines in that file
#!/bin/bash
#SBATCH --ntasks=8
superfocus --threads 8 -q fastq -b ~/superfocus_db -a diamond -dir superfocus_results
In this command:
- The
fastq
option is the name of our input directory and is the same name we used above underBefore you begin
- The
~/superfocus_db
option is the name of our database. If you have downloaded your own databases you should omit that information. This is the database that currently works with the version of super-FOCUS and diamond in conda - The
diamond
option is the aligner that we are going to run. - The
superfocus_results
option is where we are going to find the output
Then, we are going to submit that to run on the cluster
sbatch superfocus.slurm
and you can monitor the progress with
squeue
or
squeue -u <FAN>
where <FAN>
is your FAN!
Notice that super-FOCUS
will also output a lot of information in a file that will be called something like slurm-1709843.out
(but the number will be totally different). That tells you whether the command has worked or if there was some kind of error.