I've got a directory with 10 sub-directories (dir01 to dir10) and a number of files in each of those (new files are added every day to the sub-directories).
I'm trying to write a snakemake
file that will go through all of the sub-directories and all the files and process them (run my convert.exe
executable to convert my .Stp files to .Xml). The processed files will be moved to a new directory but into sub-directories with the same names as before and the same file name.
So - as an example in the end the final job flow should run similar to this:
/data01/dir01/Sample1.Stp --> processed by convert.exe --> /data01/temp/dir01/Sample1.xml
I'd also like to divide this over the 12 CPUs I've got access to, running it in parallel.
I've just started using snakemake and have gone through a couple tutorials however am getting a little lost.
Here is what I have so far: It's not working and I'm not even sure if this is the write way to go about it. This is also only the first part - just trying to loop through the directories and files (not trying to convert or run in parallel yet).
directories = glob_wildcards("/data01/{dir}")
files = glob_wildcards("/data01/{dir}/{file}")
rule all:
input:
expand("/data01/temp/{dir}/{file}.moved.Stp", dir=directories, file=files)
rule sort:
input:
"/data01/{dir}/{file}.Stp"
output:
"/data01/temp/{dir}/{file}.moved.Stp"
shell:
"..."
Any help about how to go about this would be greatly appreciated!
Thanks!