extract row numbers of interesecting elements

Question

I do have two table. I want to extract a single colum from the second table and past it into the first table. The problem is that not all rows of the colum of the second table should be copied but that only those are copied whose first colum matches with the first table

read.table("table1")->c
read.table("table2")->d
d[,1] %in% c[,1] ->f

does only lead to a vector with TRUE and FALSE - but I would need the row number, then having such a vector with the row numbers of the matching elements, I would need to extract exactly these rows from table d fourth column

d[,4]->g
g[vector with numbers,]->g1

is there an easy possibility?

The vector with TRUE/FALSE values for each position, can be used for subsetting in exactly the same was as a vector of row numbers. — James, Mar 13 '13 at 11:25
@Tim just I am curious why are you using `->` and not `<-` to assign variable? — agstudy, Mar 13 '13 at 11:30
@SimonO101 I don't get your point here? you haven't seen what? — agstudy, Mar 13 '13 at 11:39
I have *NEVER* seen someone use -> after the expression for the assignment, that's all. I'm not saying it's better or worse, just unusual — Simon O'Hanlon, Mar 13 '13 at 11:40
Tim, did either solution work for you? I notice you have asked nine questions and have not accepted a single answer yet. If the solutions that people are kind enough to provide work for you, please press the green tick arrow next to your preferred answer, that way these questions can be removed from the unanswered stack. If they do not answer your question please ask for further clarification. Thanks. — Simon O'Hanlon, Mar 14 '13 at 07:35

score 3 · Answer 1 · answered Mar 13 '13 at 11:17

3

Or with match

f <- d[ match(c[,1] , d[,1]) , ]

answered Mar 13 '13 at 11:17

Simon O'Hanlon

58,647
14
142
184

score 3 · Answer 2 · answered Mar 13 '13 at 11:20

3

This is a classic merge:

merge(c,d[,c(1,4)],by=1)

If you have names in your data tables, the matching may be performed without specifying the by parameter. As a side note, since c is a very common base function (which I've used here), it is not a great choice for a variable name.

answered Mar 13 '13 at 11:20

James

65,548
14
155
193

@Arun indeed it will be faster. – agstudy Mar 13 '13 at 11:26
@Arun True, but the extra time that `merge` spends is on safety and convenience. Unless speed is proving an issue, I would err on the side of caution. – James Mar 13 '13 at 11:27
Can you explain the safety issue? Is this because of NA returns in the case of nomatch? I suppose you could just do `f <- d[ na.omit( match(c[,1] , d[,1]) ) , ]` in that case – Simon O'Hanlon Mar 13 '13 at 11:31
@SimonO101 Yes, and if you name your variables appropriately, it can help that you are matching on the correct variables. – James Mar 13 '13 at 11:45
Fairly sure. If I understand correctly, he wants elements from `d` that have matches in `c`. So.. `c <- 1:10; d <- seq(3,15,3); f<- d[ match( c , d ) ]` gives `NA NA 3 NA NA 6 NA NA 9 NA`. NA represent that the value in the first column was not found in the second column. I *think* that's what he want no? – Simon O'Hanlon Mar 13 '13 at 11:53

extract row numbers of interesecting elements

2 Answers2