Snakemake/DahakFlot
From charlesreid1
Main repository:
Start
This tutorial is assuming you start out on a machine with a conda-based python distribution (Anaconda or Miniconda or other).
Installing Conda with Pyenv
If you don't have a version of conda, it is recommended you use Pyenv to manage versions of python.
A very simple pyenv installation script:
install_pyenv.py
#!/usr/bin/python3
import getpass
import subprocess
def install_pyenv():
user = getpass.getuser()
if(user=="root"):
raise Exception("You are root - you should run this script as a normal user.")
else:
# Install pyenv
pyenvcmd = ["curl","-L","https://raw.githubusercontent.com/pyenv/pyenv-installer/master/bin/pyenv-installer","|","/bin/bash"]
subprocess.call(pyenvcmd, shell=True)
# We don't need to add ~/.pyenv/bin to $PATH,
# it is already done.
if __name__=="__main__":
install_pyenv()
As noted in the script, you will need to add ~/.pyenv/bin to the path, as that is where the Pyenv versions of Python live.
Next, we have a python script to install snakemake:
#!/usr/bin/python3
import getpass
import tempfile
import subprocess
def install_pyenv():
user = getpass.getuser()
if(user=="root"):
raise Exception("You are root - you should run this script as a normal user.")
else:
# Install snakemake
conda_version = "miniconda3-4.3.30"
installcmd = ["pyenv","install",conda_version]
subprocess.call(installcmd)
globalcmd = ["pyenv","global",conda_version]
subprocess.call(globalcmd)
# ---------------------------
# Install snakemake
pyenvbin = os.environ['HOME']
condabin = pyenvbin+"/.pyenv/shims/conda"
subprocess.call([condabin,"update"])
subprocess.call([condabin,"config","--all","channels","r"])
subprocess.call([condabin,"config","--all","channels","default"])
subprocess.call([condabin,"config","--all","channels","conda-forge"])
subprocess.call([condabin,"config","--all","channels","bioconda"])
subprocess.call([condabin,"install","--yes","-c","bioconda","snakemake"])
# ---------------------------
# Install osf cli client
pyenvbin = os.environ['HOME']
pipbin = pyenvbin+"/.pyenv/shims/pip"
subprocess.call([pipbin,"install","--upgrade","pip"])
subprocess.call([pipbin,"install","--user","osfclient"])
if __name__=="__main__":
install_pyenv()
Get Tutorial Files
Start by getting the files needed for the tutorial:
wget https://bitbucket.org/snakemake/snakemake-tutorial/get/v3.11.0.tar.bz2 tar -xf v3.11.0.tar.bz2 --strip 1
Create the conda environment:
conda env create --name snakemake-tutorial --file environment.yaml
Now activate the conda environment:
source activate snakemake-tutorial
First Snakefile
Create a Snakefile, and add the first rule:
rule bwa_map:
input:
"data/genome.fa",
"data/samples/{sample}.fastq"
output:
"mapped_reads/{sample}.bam"
shell:
"bwa mem {input} | samtools view -Sb - > {output}"
This creates a folder with data in it, and an environment.yaml file for conda.
Executing the Rule
Unlike with Makefiles, Snakefile rules are executed based on their output files. So, if we want to execute the rule bwa_map, we look at the output file:
rule bwa_map:
...
output:
"mapped_reads/{sample}.bam"
That means we have to run snakefile and ask for mapped_reads/{sample}.bam, and this requires the input file data/samples/{sample}.fastq to be in place.
The shell portion is what stitches the input and output files together. So basically, you say "I want this output file" and Snakemake back-calculates the task graph needed to obtain that output file.
To do a dry run of the workflow:
$ snakemake -np mapped_reads/A.bam mapped_reads/B.bam
rule bwa_map:
input: data/genome.fa, data/samples/B.fastq
output: mapped_reads/B.bam
jobid: 0
wildcards: sample=B
bwa mem data/genome.fa data/samples/B.fastq | samtools view -Sb - > mapped_reads/B.bam
rule bwa_map:
input: data/genome.fa, data/samples/A.fastq
output: mapped_reads/A.bam
jobid: 1
wildcards: sample=A
bwa mem data/genome.fa data/samples/A.fastq | samtools view -Sb - > mapped_reads/A.bam
Job counts:
count jobs
2 bwa_map
2
This explains the tasks that are going to be executed: two input files leads to two separate bwa_map tasks. We can see the commands that Snakemake is going to execute just below information about each rule.