Part 2: Using conda on deepthought
Series: An Introduction To Using DeepThought For Bioinformatics
There are two ways to set up conda
on deepthought. The easy way or the hard way. You choose!
1. The easy way: Set up conda using modules
The conda module
is available on deepthought, so we can load it and initialise conda:
module load Miniconda3
conda init bash
conda create -n bioinformatics
Now log out of the computer. When you log back in you have conda activated, but you can not write to the base
install, so before you use it you need to
conda activate bioinformatics
2. The hard way: Set up conda by installing it
Activate Python
We are going to use conda to install and update software. Before we begin, we need to activate the python
module that is not initiated by default.
We are going to use vi
again to add the python module activation so that we don’t need to do it every time. Copy this line and press Enter.
vim ~/.bashrc
Use the arrow keys to move to the line after the line that says module load slurm
, and press i
to switch to --INSERT--
mode.
Paste this text:
module load python
Press the Escape
key, and then type :wq
to save the changes and exit. (Note that here we combine the :w
and :q
commands!)
Now run this command to activate Python:
source ~/.bashrc
You should be able to type this command and get a response now:
python3.8 --version
Download conda
Head to the miniconda download page and download the appropriate installer.
You want the one from Latest Miniconda Installer Links. Get:
- Linux Installer
- Python with the biggest number
- Miniconda3 Linux 64-bit
Here is a quick way to download that script!
Right click on the appropriate link, and choose Copy link address. Go back to your terminal window (probably Putty) and type wget
and a space, and then paste the URL that you just copied. Press return and it should download the file for you!
The file should be called Miniconda3-latest-Linux-x86_64.sh
but in case it is not, just substitute the appropriate file name below. Remember that you can check with ls -ltr
to see the newest file downloaded.
Run the miniconda installer:
bash Miniconda3-latest-Linux-x86_64.sh
This will ask you some questions, and you can pretty much accept the default answer to all the questions.
Once the installer has finished the best way to continue is to log out, and then log back in. This will reset your account and you will have conda activated. At the bottom left of your screen you should see it say (base)
which means that you are in the base conda installation.
Conda
is great, but Mamba
is better!
Mamba is a drop in replacement for conda, and it is the first thing we will install.
conda install -c conda-forge mamba
This will take a moment to set everything up and figure out what needs to be installed, and then ask you if you are sure.
Type yes
and press return and mamba
will be installed. From here on out, forget conda and think mamba!
Install your first bioinformatics package
We are going to install prinseq++
and test to see if it works. This will demonstrate how to install a conda package. We make a new environment named bioinformatics
and use the conda-forge
and bioconda
channels to install prinseq++
mamba install -c conda-forge -c bioconda prinseq-plus-plus
Again, this will work on resolution of the packages for you and ask if you are sure. Once it is complete, you should be able to issue the command:
prinseq++ -v
to see the version of prinseq++
that has been installed.
Install snakemake
For the next steps of this tutorial, we are going to use snakemake to run some things on the cluster. So we are going to use mamba
to install that:
mamba install -c conda-forge -c bioconda snakemake=5.22.0
NOTE: At the time of writing, snakemake was introducing a new package called PuLP which makes the pipelining a lot faster, but is a bit glitchy. So we install version 5.22.0 which predates those changes. For more information see this issue, or this issue or this suggestion
Once that has completed, snakemake -v
should show you the current version.
Install other bioinformatics packages
You can install pretty much any bioinformatics package using conda. The anaconda website has a complete list and you can visit the bioconda page for more information about bioconda.
but to get started, you might want to install:
mamba install -c conda-forge -c bioconda blast focus diamond
Adding channels
It is a pain to keep typing -c conda-forge -c bioconda
so we can just add those two channels to our configuration
conda config --add channels conda-forge
conda config --add channels bioconda
Updating a package
With conda
(or mamba
) you can easily update a package if there is a newer version. For example, to update snakemake
you would use:
mamba update snakemake
What this means is the you are responsible for ensuring your software is up-to-date. Or not. If you are working on a set of data analyses you may want to keep all the software at the same version so that each time you do an analysis you get comparable answers. With conda
, you have control over the update cycles, but don’t forget from time-to-time you might want to update the software!
Environments
If you want to keep different versions of software or run different pipelines you can do that with conda, in what are called environments
. Each one can have different software. conda
is clever, because if you have the same software in two different environments you don’t need an entire copy of the software. At this stage, you don’t need to worry about that, and you can just install everything in the base
environment. But if you start to run into installation issues, then remember you can separate things into different environments.