0

I'm trying to download a dataset in the BAM Format from GEO/SRA, that I can use for analysis in RStudio.

I tried using this method: where i downloaded .sra and converted it to .bam

prefetch GSM269238
sam-dump C:\Users\Desktop\sratoolkit.2.10.8-win64\bin\ncbi\SRA\sra\GSM2692389.sra --output-file GSM2692389.bam

However, in RStudio this didn't work, and returned an error, saying it couldn't read the bam file This is my R Code; I'm using RSamTools

> bamfiles <- list.files("directory redacted due to privacy", ".bam")
> file.exists(bamfiles)
[1] TRUE
> 
> 
> #---> Define bam files for count step on Rsamtools
> 
> library("Rsamtools")
> bamfiles <- BamFileList(bamfiles, yieldSize=2000000)
> seqinfo(bamfiles)
Error in value[[3L]](cond) : 
  failed to open BamFile: SAM/BAM header missing or empty
  file: 'GSM2692389.bam'

Does anyone know how to help me download the SRA data into readable .bam files? Any help or guidance would be much appreciated as I'm really trying to meet a deadline with this.

Phil
  • 7,287
  • 3
  • 36
  • 66
ww22an
  • 41
  • 2
  • 6
  • You could first check with samtools that the bam file is at least readable with samtools: `samtools view GSM2692389.bam` – bli Aug 07 '20 at 09:03
  • You should also specify in your question how you tried to open the bam file in RStudio. What R functions did you use, what was the exact error message? – bli Aug 07 '20 at 09:05

1 Answers1

6

I'd say that your problem is caused by the fact that you don't actually have bam files ! Right now, your command is downloading sam files (hence the name sam-dump) and you're just saving these with a bam extension (a simple test would be to use head on your "bam files". If you can read them, then they're not binary, which means they're not bam. Otherwise, you can use samtools view, as bli suggested).

That being said, can you try this (make sure samtools is installed before using this) :

sam-dump C:\Users\Desktop\sratoolkit.2.10.8-win64\bin\ncbi\SRA\sra\GSM2692389.sra | samtools view -bS - > GSM2692389.bam

Also, if you're not particularly interested in downloading the .sra files, you might as well use this, which is easier and shorter (and maybe faster as well) :

sam-dump SRR5799988 | samtools view -bS - > GSM2692389.bam

I took the liberty of replacing your GSM number by the associated SRR number (see https://www.ncbi.nlm.nih.gov/sra?term=SRX2979455 ) but don't hesitate to double check the SRR !


More information on sam-dump : https://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=toolkit_doc&f=sam-dump

athiebaut
  • 153
  • 9
  • Oh thanks so much, this looks to be very very helpful out of all the responses.For samtools, I can't seem to download it which is odd.... I run on a Windows, is it possible to run it there? – ww22an Aug 07 '20 at 17:10
  • I'm gonna try again, maybe i didn't do the insatallation correctly, and again thanks for your help – ww22an Aug 07 '20 at 17:17
  • You're welcome. I don't think there is a Windows version for samtools. But you can install a Windows Subsystem for Linux (WSL) on your computer and it'll allow you to run linux commands (even ssh) and softwares on your Windows computer, so this might work for samtools. Here are 2 tutorials to install the WSL : https://www.windowscentral.com/install-windows-subsystem-linux-windows-10#install_linux_subsystem_settings_windows10 and https://learn.microsoft.com/en-gb/windows/wsl/install-win10. – athiebaut Aug 07 '20 at 18:50
  • If none of that works, you can also try the online Galaxy to convert sam to bam : https://usegalaxy.org/tool_runner?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fiuc%2Fsamtools_view%2Fsamtools_view%2F1.9%2Bgalaxy1 – athiebaut Aug 07 '20 at 18:51
  • 1
    Have you managed to read your bam files ? If so, could you please accept the answer ? Otherwise, tell me if you need more help ! – athiebaut Aug 12 '20 at 10:48