From charlesreid1

Line 78: Line 78:
In terms of how this translates for Snakemake files, here's how the organization looks:
In terms of how this translates for Snakemake files, here's how the organization looks:


'''Subtask Snakemake Rules''': each subtask directory contains a Snakemake rule file that defines a master rule for that subtask and specifies what that subtask does.
'''Subtask Snakemake Rules''':
* Each subtask directory contains a Snakemake rule file that defines a master rule for that subtask and specifies what that subtask does.


* '''Snakemake Configuration Files''': each subtask directory contains a Snakemake config file that defines defaults (which the user can change). If there are multiple rules, rules specific to a rule file can also be defined in that file.
'''Snakemake Configuration Files''':
* Each subtask directory contains a Snakemake config file that defines defaults (which the user can change).
* If there are multiple rules, rules specific to a rule file can also be defined in that file.


==Flags==
==Flags==

Revision as of 10:09, 23 February 2018

Snakemake patterns for Snakefiles and complex workflows.

Also, repo of awesome examples here: https://percyfal.github.io/snakemake-rules/docs/configuration.html

Creating a master rule

If Snakemake is not provided with a rule to execute, it will execute the first rule in the file.

To create a master rule that will call other rules defined in the Snakemake file, the make equivalent of "all" or "default", you should define it first (you can name it whatever you would like):

mynames   = ['A','B','C']
myinputs  = [j+".txt" for j in mynames]
myoutputs = [j+".dodat" for j in mynames]

rule default:
    input:
        myoutputs
    shell:
        'echo "i just run subrules!"'

This rule requires, as inputs, the variable myoutputs, which is going to contain all of the final output files that the Snakemake process should generate. This means we need to have a rule somewhere below with an output of A.dodat, B.dodat, and C.dodat:

rule cascade:
    output:
        '{name}.dodat'
    input:
        '{name}.file_from_previous_step'
    shell:
        'cp {input} {output}'

and so on...

Now, to run this default rule, just run snakemake with no arguments:

$ snakemake

this will run the first rule in the Snakefile, which is default. The default rule does nothing, but requires as inputs the final output of the entire process. Snakemake will assemble the rules that those final output files depend on.

Creating rules without target files

Normally you run a rule by specifying the output file of a rule. If you create a rule that has no output files associated with it, like the one below (clean):

rule clean:
    shell:
        'rm -f *.dodat *.int*'

you can run it by passing the rule name to snakemake:

$ snakemake clean

Philosophy of Snakemake

So far as we have developed it, here is our philosophy of Snakemake:

One Subtask, One Subdirectory: subtasks are organized into their own directories.

  • In this application, the subtasks all look the same and use the same program.
  • In real applications, each subtask would correspond to a different program or workflow step.

Atomic Tasks: Each subtask consists of a few atomic tasks (simple algebra operations).

  • The atomic tasks are carried out.
  • The subtask aggregates these into a final result.
  • This makes workflows more flexible and modular.

Aggregation:

  • Each subtask aggregates the results of the atomic tasks (e.g., a sum or product).
  • Likewise, the final master task aggregates the results of each subtask.

In terms of how this translates for Snakemake files, here's how the organization looks:

Subtask Snakemake Rules:

  • Each subtask directory contains a Snakemake rule file that defines a master rule for that subtask and specifies what that subtask does.

Snakemake Configuration Files:

  • Each subtask directory contains a Snakemake config file that defines defaults (which the user can change).
  • If there are multiple rules, rules specific to a rule file can also be defined in that file.

Flags