0

I am making a workflow in snakemake 5.8.2 which takes as an input huge files from S3 (280Gb each of 4 files). The first rule just concatenates the files. When I run the workflow, it seems to download only 5GB of some split version of each file, delete the file and fails to concatenate. I know aws transfers files in 5GB batches but I expected snakemake to handle this in the background. Am I missing something? Is this a bug? Thanks, Ilya.

ilyavs
  • 1
  • Please share some code, we can not help you without a better description. – Maarten-vd-Sande Dec 18 '19 at 10:49
  • Since the data is not public, any code I would share would be useless. A minimal example would be to try and use a large file from S3 as input for a snakemake rule. – ilyavs Dec 18 '19 at 14:06
  • Did downloading a large (public) file work? – Maarten-vd-Sande Dec 18 '19 at 16:50
  • I didn't try because I don't know of public data sets stored on S3. Do you know about such data sets? – ilyavs Dec 18 '19 at 18:12
  • 1
    [Tabula Muris Senis](https://s3.console.aws.amazon.com/s3/buckets/czb-tabula-muris-senis). Here's a 22GB file you could test: `"czb-tabula-muris-senis/10x/24_month/MACA_24m_M_PANCREASE_EXO_60/possorted_genome_bam.bam"` – merv Dec 20 '19 at 23:02
  • I got it to work with a different file of mine. It seems that the issue was caused by me using a wildcard for the file suffix. That confused snakemake because the file is appended an additional suffix when being downloaded from s3. – ilyavs Jan 05 '20 at 09:54

0 Answers0