How to compare two CSV files and write non shared items to a CSV file in R?

Question

I have two CSV files like

CSVfile1.csv

Name,Identity,Location
Apple,45,Los Angeles
Banana,78,Kingston
Coconut,87,Thailand

CSVfile2.csv

Name,Identity,Location
Apple,45,Los Angeles
Banana,78,Kingston
Coconut,87,Wisconsin
Orange,48,Florida

The desired output

Name,Identity,Location
Coconut,87,Wisconsin
Orange,48,Florida

Is there a direct function to do it in R? New to R, any help is appreciated.

Do you also want "coconut,87,Thailand" in the output too? Please be more clear of what exactly you're trying to accomplish, as well as a reproducible example if possible. — so13eit, Feb 28 '14 at 19:26
@so13eit : No I do not want 'coconut,87,Thailand' in the output too. Thanks for asking. I want the differences as to csvfile1 and also what is missing in csvfile1. — user3188390, Feb 28 '14 at 19:29

score 3 · Accepted Answer · answered Feb 28 '14 at 19:30

You have many options to this in R. In base R, ususllay we use merge or match.

Another alternative is to use the dplyr package.

library(dplyr)
## create sources from data frames
xx_src = tbl_df(xx)
yy_src = tbl_df(yy)
## to get shared items
inner_join(xx_src,yy_src)
    Name Identity    Location
1  Apple       45 Los Angeles
2 Banana       78    Kingston

## to get non shared items 
anti_join(xx_src,yy_src)
     Name Identity Location
1 Coconut       87 Thailand

where :

xx <- read.table(text="Name,Identity,Location
Apple,45,Los Angeles
Banana,78,Kingston
Coconut,87,Thailand",header=TRUE,sep=',')

yy <- read.table(text="Name,Identity,Location
Apple,45,Los Angeles
Banana,78,Kingston
Coconut,87,Wisconsin
Orange,48,Florida",header=TRUE,sep=',')

I want the differences in the sense as specified in the desired output where it will list the differences based on the csvfile1 and also list what is missing in csvfile1 — user3188390, Feb 28 '14 at 19:33

score 1 · Answer 2 · answered Feb 28 '14 at 19:52

Try this:

Lines1 <- readLines("CSVfile1.csv")
Lines2 <- readLines("CSVfile2.csv")
LinesDiff <- setdiff(Lines2, Lines1)
writeLines(c(Lines[1], LinesDiff), "CSVfileDiff.csv")

This gives:

> readLines("CSVfileDiff.csv")
[1] "Name,Identity,Location" "Coconut,87,Wisconsin"   "Orange,48,Florida"

score 0 · Answer 3 · edited May 23 '17 at 10:28

xx <- read.table(text="Name,Identity,Location
Apple,45,Los Angeles
Banana,78,Kingston
Coconut,87,Thailand",header=TRUE,sep=',')

yy <- read.table(text="Name,Identity,Location
Apple,45,Los Angeles
Banana,78,Kingston
Coconut,87,Wisconsin
Orange,48,Florida",header=TRUE,sep=',')


x <- rbind(yy, xx)
x[! duplicated(x, fromLast=TRUE) & seq(nrow(x)) <= nrow(yy), ]

Output:

         Name Identity  Location
3     Coconut       87 Wisconsin
4      Orange       48   Florida

Credit goes to Matt: R selecting all rows from a data frame that don't appear in another

How to compare two CSV files and write non shared items to a CSV file in R?

3 Answers3