-1

I have two CSV files like

CSVfile1.csv

Name,Identity,Location
Apple,45,Los Angeles
Banana,78,Kingston
Coconut,87,Thailand

CSVfile2.csv

Name,Identity,Location
Apple,45,Los Angeles
Banana,78,Kingston
Coconut,87,Wisconsin
Orange,48,Florida

The desired output

Name,Identity,Location
Coconut,87,Wisconsin
Orange,48,Florida

Is there a direct function to do it in R? New to R, any help is appreciated.

user3188390
  • 603
  • 2
  • 11
  • 19
  • 1
    Do you also want "coconut,87,Thailand" in the output too? Please be more clear of what exactly you're trying to accomplish, as well as a reproducible example if possible. – so13eit Feb 28 '14 at 19:26
  • @so13eit : No I do not want 'coconut,87,Thailand' in the output too. Thanks for asking. I want the differences as to csvfile1 and also what is missing in csvfile1. – user3188390 Feb 28 '14 at 19:29

3 Answers3

3

You have many options to this in R. In base R, ususllay we use merge or match.

Another alternative is to use the dplyr package.

library(dplyr)
## create sources from data frames
xx_src = tbl_df(xx)
yy_src = tbl_df(yy)
## to get shared items
inner_join(xx_src,yy_src)
    Name Identity    Location
1  Apple       45 Los Angeles
2 Banana       78    Kingston

## to get non shared items 
anti_join(xx_src,yy_src)
     Name Identity Location
1 Coconut       87 Thailand

where :

xx <- read.table(text="Name,Identity,Location
Apple,45,Los Angeles
Banana,78,Kingston
Coconut,87,Thailand",header=TRUE,sep=',')

yy <- read.table(text="Name,Identity,Location
Apple,45,Los Angeles
Banana,78,Kingston
Coconut,87,Wisconsin
Orange,48,Florida",header=TRUE,sep=',')
agstudy
  • 119,832
  • 17
  • 199
  • 261
  • I want the differences in the sense as specified in the desired output where it will list the differences based on the csvfile1 and also list what is missing in csvfile1 – user3188390 Feb 28 '14 at 19:33
1

Try this:

Lines1 <- readLines("CSVfile1.csv")
Lines2 <- readLines("CSVfile2.csv")
LinesDiff <- setdiff(Lines2, Lines1)
writeLines(c(Lines[1], LinesDiff), "CSVfileDiff.csv")

This gives:

> readLines("CSVfileDiff.csv")
[1] "Name,Identity,Location" "Coconut,87,Wisconsin"   "Orange,48,Florida"
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
0
xx <- read.table(text="Name,Identity,Location
Apple,45,Los Angeles
Banana,78,Kingston
Coconut,87,Thailand",header=TRUE,sep=',')

yy <- read.table(text="Name,Identity,Location
Apple,45,Los Angeles
Banana,78,Kingston
Coconut,87,Wisconsin
Orange,48,Florida",header=TRUE,sep=',')


x <- rbind(yy, xx)
x[! duplicated(x, fromLast=TRUE) & seq(nrow(x)) <= nrow(yy), ]

Output:

         Name Identity  Location
3     Coconut       87 Wisconsin
4      Orange       48   Florida

Credit goes to Matt: R selecting all rows from a data frame that don't appear in another

Community
  • 1
  • 1
so13eit
  • 942
  • 3
  • 11
  • 22