Intersection of row names in dataframe (subset the data)?

Question

Since intersect doesn't work with dataframes, I'm trying to use subset to create a subset of dfA with only data for which dfA's row names match dfB's row names. I should end up with 3000 rows because dfA has 5000 rows and dfB has 3000, and all of dfB’s row names exist in dfA’s row names.

The following just returns dfA's column names without any data.

mysubset = subset(dfA, dfA[,0] %in% dfB[,0])

Also, there's no index zero in `R`, indices are 1-based. `dfA[,0]`and `dfB[,0]` do not exist. — Rui Barradas, Jul 19 '17 at 19:22

score 1 · Answer 1 · answered Jul 19 '17 at 19:22

You should get a subset based on rownames for both data.frames.

dfA[which(rownames(dfA) %in% rownames(dfB)),]

This checks which row names from dfA are in row names of dfB (which) and returns the indices to get the data in dfA (dfA[...]).

If you want to stick to your solution (which costs a bit more, computationally):

subset(dfA, rownames(dfA) %in% rownames(dfB))

score 1 · Accepted Answer · answered Jul 19 '17 at 19:32

The rownames function will give you access to the rownames, then the set comparison condition will do what you expected.

Example, using small data frames with some shared rownames

dfA <- data.frame(x = 1:5,
                  y = 6:10,
                  row.names = letters[1:5])
# Show dfA
dfA
  x  y
a 1  6
b 2  7
c 3  8
d 4  9
e 5 10


dfB <- data.frame(x = 1:5,
                  y = 6:10,
                  row.names = letters[3:7])

# Show dfB
dfB
  x  y
c 1  6
d 2  7
e 3  8
f 4  9
g 5 10

Solution

# Subset rows with matching rownames 

dfA[ rownames(dfA) %in% rownames(dfB), ]
  x  y
c 3  8
d 4  9
e 5 10

I chose this answer since it's economical; although, Masoud's answer works as well. I don't know why I forgot all about `rownames`; I must've been so focused on using `mydf[,0]`. Thanks! — user8121557, Jul 20 '17 at 12:14

Intersection of row names in dataframe (subset the data)?

2 Answers2