1

I am trying to merge two xdf files after subsetting a long table that has duplicate ids based on a variable.

Assume I have two columns: id and type

I subset the original xdf table based on say type = 'type1', and get first xdf file I subset the original xdf table based on say type = 'type2', and get second xdf file

The first xdf file looks like (There are quite a few number of distinct IDs, but I show one ID in the example below)

id type1
__ ____
1    5

The second xdf file looks like (There are quite a few number of distinct IDs, but I show one ID in the example below)

id type2
__ ____
1    3

Then, I merge the two xdf files into another xdf file

rxMerge(file1, file2, outFile = final, autoSort = FALSE, matchVars = 'id', type = 'full', overwrite = TRUE)

I get two records for id = 1 as in

id type1 type2
__ ____ ______
1    5    NA

1    NA    3

I was expecting

id type1 type2
__ ____ ______
1    5    3

What am I doing wrong?

Alex Brown
  • 41,819
  • 10
  • 94
  • 108

1 Answers1

0

Hmm... your example as given works for me, in RRE 7.4.1:

# Example data
x <- data.frame(id = 1, type1 = 5)
y <- data.frame(id = 1, type2 = 3)

# Creating XDFs for the example data
file1 <- tempfile(fileext = ".xdf")
rxImport(inData = x, outFile = file1)

file2 <- tempfile(fileext = ".xdf")
rxImport(inData = y, outFile = file2)

# Merging into a third XDF
final <- tempfile(fileext = ".xdf")

rxMerge(inData1 = file1, 
        inData2 = file2, 
        outFile = final, 
        autoSort = FALSE, 
        matchVars = 'id',
        type = 'full',
        overwrite = TRUE)

# Check the output
rxDataStep(final)

So it's hard to know what might be going on. What happens when you set autoSort = TRUE? What version of RRE are you running? (You can get version numbers by loading RevoScaleR and running sessionInfo())

Matt Parker
  • 26,709
  • 7
  • 54
  • 72