Downloading large files from S3 with snakemake

Asked Dec 18 '19 at 09:27

Active Dec 18 '19 at 09:27

Viewed 127 times

I am making a workflow in snakemake 5.8.2 which takes as an input huge files from S3 (280Gb each of 4 files). The first rule just concatenates the files. When I run the workflow, it seems to download only 5GB of some split version of each file, delete the file and fails to concatenate. I know aws transfers files in 5GB batches but I expected snakemake to handle this in the background. Am I missing something? Is this a bug? Thanks, Ilya.

asked Dec 18 '19 at 09:27

ilyavs

Please share some code, we can not help you without a better description. – Maarten-vd-Sande Dec 18 '19 at 10:49
Since the data is not public, any code I would share would be useless. A minimal example would be to try and use a large file from S3 as input for a snakemake rule. – ilyavs Dec 18 '19 at 14:06
Did downloading a large (public) file work? – Maarten-vd-Sande Dec 18 '19 at 16:50
I didn't try because I don't know of public data sets stored on S3. Do you know about such data sets? – ilyavs Dec 18 '19 at 18:12
1

[Tabula Muris Senis](https://s3.console.aws.amazon.com/s3/buckets/czb-tabula-muris-senis). Here's a 22GB file you could test: `"czb-tabula-muris-senis/10x/24_month/MACA_24m_M_PANCREASE_EXO_60/possorted_genome_bam.bam"` – merv Dec 20 '19 at 23:02
I got it to work with a different file of mine. It seems that the issue was caused by me using a wildcard for the file suffix. That confused snakemake because the file is appended an additional suffix when being downloaded from s3. – ilyavs Jan 05 '20 at 09:54

Downloading large files from S3 with snakemake

0 Answers0