I want to download fastq raw file from RNAseq to get gene expression values. But GEO only provides .bed.gz and .wig.gz formats. What can I do to get the RPKM values? Thank you very much!
1 Answers
In order to calculate RPKM, you need (mapped) raw reads as contained in BAM/SAM or even CRAM files. Wiggle, BED and their derivatives such as bigWiggle are compressed versions of those only containing the coverage (mainly used for plotting), that is they have lost the read information needed for counting and therefore calculating RPKM (or FPKM/TPM for that manner).
The standard approach is to start from a bam file, extract the reads counts for regions of interest and calculate RPKM etc. There is many pipelines out there such as this.
If Bam files are not available, GEO usually has at least the raw fastq files (or sra files that can be converted to fastq) as a basis for mapping to obtain a bam file. Also have a look at ArrayExpress, they could have the raw files for that project since it is mirroring GEO.
Maybe as a word of warning, if you intend to do differential expression analysis, you need to go from the raw counts, not the RPKM values.

- 573
- 4
- 15