I am trying to calculate the gene coverage of specific genes in a BAM file. I have a list of genes with thier start and end positions in a BED file. I would essentially like to know the number of overlaps for each gene and how well it is covered.
My bed file looks like this:
track name="tb_ncbiRefSeqCurated" description="table browser query on ncbiRefSeqCurated" visibility=3 url= chr10 100042192 100081869 NM_001308.3 CPN1 chr10 100150093 100186029 NM_006459.4 ERLIN1 chr10 100188299 100229596 NM_001278.5 CHUK chr10 100232297 100267638 NM_001303405.2 CWF19L1 chr10 100523728 100529923 NM_001284368.1 NDUFB8 chr10 100523739 100529871 NM_001284367.2 NDUFB8 chr10 100735395 100747437 NM_001374303.1 PAX2 chr10 100735395 100829944 NM_001304569.2 PAX2
I have tried to covert the bam file to a bed file and then used bedmap to obtain this, but I dont get the fraction of coverage. This is the code I used:
bam2bed <../../bam_files/NA12878-200304612B-v15-450-Rep10_35x.bam | bedmap --delim '\t' --echo --count --bases-uniq --bases-uniq-f --echo-ref-size --mean ../../../reference_genes/5_selectedCols_mendel_genes - > test_bed
I get an output that looks like this:
chr10 100150093 100186029 NM_006459.4 ERLIN1 0 0 0.000000 35936 NAN chr10 100188299 100229596 NM_001278.5 CHUK 0 0 0.000000 41297 NAN chr10 100232297 100267638 NM_001303405.2 CWF19L1 0 0 0.000000 35341 NAN chr10 100523728 100529923 NM_001284368.1 NDUFB8 0 0 0.000000 6195 NAN chr10 100523739 100529871 NM_001284367.2 NDUFB8 0 0 0.000000 6132 NAN chr10 100735395 100747437 NM_001374303.1 PAX2 0 0 0.000000 12042 NAN chr10 100735395 100829944 NM_001304569.2 PAX2 0 0 0.000000 94549 NAN chr10 100745581 100829944 NM_003989.5 PAX2 0 0 0.000000 84363 NAN
The counts and the fraction of bases are not calculated.
Could you help me to find a better solution. I am sure this has been done before but I just cant find it.