1

I want to download files from a password protected FTP server in a Snakemake rule. I have seen the answer of Maarten-vd-Sande on specifying it using wildcards. Is it also possible using inputs without running into the MissingInputException?

FILES = ['file1.txt',
         'file2.txt']

#remote file retrieval

rule download_file:
    # replacing input by output would download all files in one job?
    input:
        file = expand("{file}", file=FILES)
    shell:
        # #this assumes your runtime has the SSHPASS env variable set
        "sshpass -e sftp -B 258048 server<< get {input.file} data/{input.file}; exit"

I have seen the hint on the SFTP class in snakemake, but I am unsure how to use it in this context.

Thanks in advance!

enryh
  • 83
  • 1
  • 10
  • I don't understand why you'd use the url as input? What would be the output? I currently am I holidays so it might be difficult for me to find the time to answer your question. – Maarten-vd-Sande Jul 21 '20 at 18:22
  • yes, it should be output. When I first wrote it I had my wildcards expansion not set up properly. – enryh Jul 29 '20 at 14:34

2 Answers2

1

I haven't tested this, but I am guessing something like this should work! We say that all the output we want is in rule all. Then we have the download rule to download those. I have no experience with using snakemake.remote, so I might be completely wrong in this though.

from snakemake.remote.SFTP import RemoteProvider
SFTP = RemoteProvider()

FILES = ['file1.txt',
         'file2.txt']

rule all:
    input:
        FILES
    
rule download_file:
    input:
        SFTP.remote("{filename}.txt")
    output:
        "{filename}.txt"
    # shell:   # I am not sure if the shell keyword is required, if not, then you can remove these two lines. 
    # The : does nothing, just for the sake of having something there
    #     ":"
Maarten-vd-Sande
  • 3,413
  • 10
  • 27
  • I would have to find out how to provide the password properly. (In the end I want to avoid that the password is logged explicitly anywhere, but just passed as environment variables from one execution shell to the next, available for the program to execute. If this is not a prerequiste, [pysftp](https://pypi.org/project/pysftp/) seems to work (on which `snakemake.remote.SFTP` is based. – enryh Jul 29 '20 at 14:38
  • @enryh, you could take a look at [envvar](https://snakemake.readthedocs.io/en/stable/snakefiles/configuration.html?highlight=envvars#environment-variables) – Maarten-vd-Sande Jul 29 '20 at 14:41
  • 1
    [`snakemake.remote.SFTP`](https://github.com/snakemake/snakemake/blob/master/snakemake/remote/SFTP.py) just passes the arguments, so I will check that out. I try to get [`envvars`](https://stackoverflow.com/questions/63153511/snakemake-envvars-are-not-passed-to-cluster-execution-snakemake-throws-an-error) running for me:) – enryh Jul 29 '20 at 14:45
0

So I ended up using the following. The trick was how to pass the command to sftp using <<< "command". The envvars let snakemake check that the SSHPASSis set for sshpass to pick up.

envvars:
    "SSHPASS"

#remote file retrieval
# #Idea: Replace using SFTP class
rule download_file:
    output:
        raw = temp(os.path.join(config['DATADIR'], "{file}", "{file}.txt"))
    params:
        file="{file}.txt"
    resources:
        walltime="300", nodes=1, mem_mb=2048
    threads:
        1
    shell:
        "sshpass -e sftp -B 258048 server <<< \"get {params.file} {output.raw} \""
enryh
  • 83
  • 1
  • 10