0

I have extracted from some texts informations about genes and chromosomes in object to classify a database of some texts.

my result missed some informations; in fact some texts could contain just the gene name and the location and i want to get the omim number, the gene symbol, the gene name, and the chromosome location

this is a part from my results ( using Rcode)

       OMIM   GENES_SYMBOL         GENES        CHROMOSOME
1      (NA)       (arlts1)         (NA)              (NA)
2      (NA)          (mtr)          (NA)              (NA)
3      (NA)        (hla.g)          (NA)              (NA)
4      (NA)  (nat2, t341c)          (NA)              (NA)
5  (222300)         (wfs1)          (NA)            (X4p16)

I want to get rid of the NA's: replace each one with the equivalent nae or code; for example something that takes arlts1 and find the specified omim number the gene name and the chromosome location.

I searched a lot but I couldn'it find an exhaustive data base that contains all the informations

May be i can do that with biomart? I don't know even what is it could someone help me with some solutions to my problem?

pogibas
  • 27,303
  • 19
  • 84
  • 117
  • 1
    You can try out the online version of biomart here http://www.ensembl.org/biomart/martview/a32addd1d2fc8418ccef540cec3f2b71 Simply paste the GENES_SYMBOL as filters `Input external references ID list` with Gene Symbols. The same can be done with the R package as well. Thus, if you are happy with the online results you can switch to R again. There are also some bioconductor packages including gene specific information. – Roman Aug 27 '18 at 13:08
  • I've answered a very similar question today. Take a look and adjust your script accordingly based on your attributes of interest. Let me know if you'll need more help [Related answer](https://stackoverflow.com/a/52222360/7856717) – Steve Sep 07 '18 at 13:23

0 Answers0