0

Hi I have a rscript that reads files from different locations based on a text file.

The text file named scenarios.txt is here:

scenario    name    directory
2017    Dijn    2017-v0
2022    Pikn    2030-v1
2040    Enn 2040

A sample rscirpt is like the following:

library(tidyverse)
library(sf)
ss = read.delims("Simulations/scenarios.txt")
for(i in rows.along(ss){
gps = st_read(paste0("/usd/clove/simulations/",ss$directory[i],"/trips_gps.gpkg") )
st_write(gps, paste0("/usd/clove/simulations/",ss$directory[i],"/trips_gps_",ss$name[i],".gpkg") )

}

How will I write a snakemake file in such scenarios, where the file path is taken as an input from a text file ?

  • To me it is not entirely clear what you are trying to accomplish. You are (most) probably best of by creating a list following the syntax: `lapply(files.to.read, function(x) {...all operations on x...})` – Wimpel Feb 11 '23 at 09:52
  • Actually im using a .txt file to read the directories name in R. This I wanted to implement in snakemake – Thaatha_Paati Feb 11 '23 at 09:54
  • @Wimpel, sorry i had made a sample file. Now I have edited it. Can you check my question now ? – Thaatha_Paati Feb 11 '23 at 10:22
  • so you just want to copy the files to a new name? if so, why not use `file.rename()`? – Wimpel Feb 11 '23 at 10:35
  • No actually i have a different file. This rscript is just an example. What I want is not an improvement for my rscript, but how to implement the same in snakemake. @Wimpel – Thaatha_Paati Feb 11 '23 at 11:35
  • You still aren't quite providing enough information to tell what is best here. How do you envision what `gps` and `st_write` produce fitting in a Snakemake workflow where input and output are file names. (Especially for what `st_write` does.) For example, if each of those is going to be a list if files to act on then they'd be written as [input functions](https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#input-functions). If you just want to make a Python list as the workflow starts up you can just directly put in Python code that runs at that time and makes lists. Or ... – Wayne Feb 11 '23 at 18:09
  • There are ways to use R in conjunction with snakemake. See [here](https://stackoverflow.com/a/68620783/8508004). Or maybe [here](https://stackoverflow.com/a/57871674/8508004). Snakemake itself is a superset of Python and so it is easier to use Python in these cases but it looks like there's way to use R. Keep in mind that Snakemake is centered around files as input and output. If you can use R to do that with your code and it get into a form Snakemake accepts on the `input` and `output` directives, then you should be all set to stick with Snakemake. Otherwise those are fairly ... – Wayne Feb 11 '23 at 18:17
  • straight-forward equivalents to implement in Python. At least the `gps = st_read(..` line I think. I assume that is a vector of file paths? I'm not sure what you are doing with `st_write` there though. – Wayne Feb 11 '23 at 18:17

1 Answers1

0

I'll give it try answering, but I'm not sure I understand the question and this below is not tested at all.

import pandas

ss = pandas.read_csv('Simulations/scenarios.txt', sep='\t')

rule all:
    input:
        expand('/usd/clove/simulations/{directory}/trips_gps_{name}.gpkg', zip, 
            directory=ss['directory'], name=ss['name'])

rule writer:
    input:
        fin='/usd/clove/simulations/{directory}/trips_gps.gpkg',
    output:
        fout='/usd/clove/simulations/{directory}/trips_gps_{name}.gpkg',
    script:
        'io.R'

where io.R may be something like this:

library(sf)

gps = st_read(snakemake@input[['fin']])
st_write(snakemake@output[['fout']])

First you read the sample sheet and you use the information there to prepare the output files (rule all). Then for each row (i.e. output file) you run rule writer.

I wouldn't hardcode file names and path in the snakefile, better to pass them as configuration parameters but this is a separate issue.

dariober
  • 8,240
  • 3
  • 30
  • 47