0

I am new to this R programming, I am having problems in getting gene name and symbols for the Affy probe ids using R programming language.

  • probe Symbol Name
  • 215535_s_at NA NA
  • 32836_at NA NA
  • 210678_s_at NA NA
  • 32837_at NA NA
  • 219723_x_at NA NA
  • 223182_s_at NA NA But iam not able to pull the details from merging HGNC and David flat file.

Please let me know how best this can be solved.

I used the following code

enter code here
probe <- read.delim("super.txt",stringsAsFactors=F, header = T, sep="\t")
probe$probeid<-tolower(probe$probeid)
names<-read.delim("GSE42568_probeid.txt", as.is=T, stringsAsFactors=F, header=T)
##insted of dataframa we are sending out the vecotr
names<-names$probeid

NoMatchID = NULL
vec<-NULL
system.time({
for (i in 1:11390){
  index<-grep(names[i],probe$probeid,fixed=T)
  #index<-grep(paste("^",names[i],"$"),probe$probeid,fixed=T)
  if (length(index)!=0) {
    cat("Index of", names[i],"is", index, "\n")
  } else {
    cat("Index of", names[i], "Found No Match \n")
    NoMatchID = c(NoMatchID,i)
  }
NoMatchID<-c(NoMatchID,index)
vec_NA <- data.frame(probe[-NoMatchID,])
}
})
NoMatchID <- data.frame(probe[NoMatchID,]) 

NoMatchID_probe = setdiff(1:nrow(probe), unique(vec))
write.table(vec_NA, file = "probeids_matched_1.txt", row.names = FALSE, append =     FALSE, col.names = TRUE, sep = "\t")

Please let me know if you guys have any other way/s to solve this issue :(..it would be of great help to me!!!

David
  • 99
  • 1
  • 1
  • 9
  • Are you genes very specific? One typical way is to use the `annotate` and `hgu133plus2.db` packages from Bioconductor. Try those two packages and run `require("annotate"); require("hgu133plus2.db"); gene.Symbols <- getSYMBOL(names, "hgu133plus2")` You could just then omit those which are NA if you wish. – cdeterman Sep 15 '14 at 13:36

1 Answers1

0

I am not sure to understand what you ask. If your gene name and symbol is in your probe dataframe

probe <- read.delim("super.txt",stringsAsFactors=F, header = T, sep="\t")
probe$probeid<-tolower(probe$probeid)
names<-read.delim("GSE42568_probeid.txt", as.is=T, stringsAsFactors=F, header=T)
##insted of dataframa we are sending out the vecotr
names<-names$probeid

and you want to pull out the names of those rows in probe that does not match your names vector. Then you should modify your code as the following:

#  NoMatchID = NULL
MatchID_probe <- NULL

for (i in 1:11390){
  index<-grep(names[i],probe$probeid,fixed=T)
  if (length(index)!=0) {
    cat("Index of", names[i],"is", index, "\n")
    MatchID_probe = c(MatchID_probe,index)
  } else {
    cat("Index of", names[i], "Found No Match \n")
    # NoMatchID = c(NoMatchID,i)
  }
}

NoMatchID_probe = setdiff(1:nrow(probe), unique(MatchID_probe))
DF_NoMatch <- probe[NoMatchID_probe,] 

DF_NoMatch
MasterJedi
  • 1,618
  • 1
  • 18
  • 17