I am making a workflow in snakemake 5.8.2 which takes as an input huge files from S3 (280Gb each of 4 files). The first rule just concatenates the files. When I run the workflow, it seems to download only 5GB of some split version of each file, delete the file and fails to concatenate. I know aws transfers files in 5GB batches but I expected snakemake to handle this in the background. Am I missing something? Is this a bug? Thanks, Ilya.
Asked
Active
Viewed 127 times
0
-
Please share some code, we can not help you without a better description. – Maarten-vd-Sande Dec 18 '19 at 10:49
-
Since the data is not public, any code I would share would be useless. A minimal example would be to try and use a large file from S3 as input for a snakemake rule. – ilyavs Dec 18 '19 at 14:06
-
Did downloading a large (public) file work? – Maarten-vd-Sande Dec 18 '19 at 16:50
-
I didn't try because I don't know of public data sets stored on S3. Do you know about such data sets? – ilyavs Dec 18 '19 at 18:12
-
1[Tabula Muris Senis](https://s3.console.aws.amazon.com/s3/buckets/czb-tabula-muris-senis). Here's a 22GB file you could test: `"czb-tabula-muris-senis/10x/24_month/MACA_24m_M_PANCREASE_EXO_60/possorted_genome_bam.bam"` – merv Dec 20 '19 at 23:02
-
I got it to work with a different file of mine. It seems that the issue was caused by me using a wildcard for the file suffix. That confused snakemake because the file is appended an additional suffix when being downloaded from s3. – ilyavs Jan 05 '20 at 09:54