39

I have two csv files, and each of which consists of one column of data

For instance, vecA.csv is like

id
1
2

vecB.csv is like

id
3
2

I read the data set as follows:

vectorA<-read.table("vecA.csv",sep=",",header=T)
vectorB<-read.table("vecB.csv",sep=",",header=T)

I want to generate a vector consisting of elements belonging to B only.

thelatemail
  • 91,185
  • 12
  • 128
  • 188
user785099
  • 5,323
  • 10
  • 44
  • 62

2 Answers2

81

You are looking for the function setdiff

setdiff(vectorB$id, vectorA$id)

If you did not want this reduced to unique values, you could create a not in function

(kudos to @joran here Match with negation)

'%nin%' <- Negate('%in%')

vectorB$id[vectorB$id %nin% vectorA$id]
Community
  • 1
  • 1
mnel
  • 113,303
  • 27
  • 265
  • 254
13

If your vector's are instead data.tables, then all you need are five characters:

B[!A]

library(data.table)

# read in your data, wrap in data.table(..., key="id") 
A <- data.table(read.table("vecA.csv",sep=",",header=T), key="id")
B <- data.table(read.table("vecB.csv",sep=",",header=T), key="id")

# Then this is all you need
B[!A]

[Matthew] And in v1.8.7 it's simpler and faster to read the file as well :

A <- setkey(fread("vecA.csv"), id)
B <- setkey(fread("vecB.csv"), id)
B[!A]
swihart
  • 2,648
  • 2
  • 18
  • 42
Ricardo Saporta
  • 54,400
  • 17
  • 144
  • 178