Part 4: Snakemake profiles for DeepThought
Series: An Introduction To Using DeepThought For Bioinformatics
Please note: Mike posted a much better and more detailed profile explanation. You should use that!
Please note: much of this information was distilled from this great blog post about snakemake profiles
That post has more information and details and you should take some time to read it and the associated post about running snakemake on the cluster. However, this will get you started on deepthought.
Why make a snakemake profile? A profile will allow us to define a default set of rules (e.g. memory requirements, number of CPUs, and total time to run) and then we can override those rules on a rule-by-rule basis in our
snakefile
.
Making a profile
We start by making a directory for our profile:
mkdir -p ~/.config/snakemake/slurm
and then create a file in there with some simple rules: vi ~/.config/snakemake/slurm/config.yaml
the file contents are:
jobs: 100
cluster: "sbatch -t {resources.time_min} --mem={resources.mem_mb} -c {resources.cpus} -o logs_slurm/{rule}_{jobid}.out -e logs_slurm/{rule}_{jobid}.err "
default-resources: [cpus=1, mem_mb=2000, time_min=60]
latency-wait: 60
local-cores: 32
This will submit at most 100 jobs, using the sbatch
slurm command, and will use resources.time_min
, resources.mem_mb
, and resources.cpus
as appropriate for time, memory, and ncpus. It needs a directory called logs_slurm to write the output files. It
Now in our rules, we can add a resources
directive that will override these default parameters.
This example is taken from the aforementioned blog post:
rule mapFASTQ:
input:
f1 = "RawData/{sample}_1.fastq.gz",
f2 ="RawData/{sample}_2.fastq.gz",
ref = "Ref/ref.fa"
output: temp("Alignment/{sample}.sam")
resources: time_min=300, mem_mb=8000, cpus=8
shell:
"""
bwa mem -@ {resources.cpus} -R "@RG\\tID:{wildcards.sample}\\tSM:{wildcards.sample}" {input.ref} {input.f1} {input.f2} > {output}
"""
Now, to run the snakefile
we no longer use the --cluster
command, instead we use:
snakemake --profile slurm -s snakefile
Important Note
In the above profile, we write the slurm
output and error files to a directory called logs_slurm. The slurm submission may hang unless you make this directory before you run snakemake. If you run into trouble, make sure you mkdir!
mkdir -p logs_slurm
snakemake --profile slurm -s snakefile