1

I am using the RevoScaleR package in MS Visual Studio, and I'm stuck on a step.

I have one XDF file with a column called "Total_Admits_Pred". I have another XDF file with a column called "Total_Admits".

Both XDF files have the same number of rows. I would like to combine the two XDF files into a single XDF file with both of these columns. How could I do that?

Thanks!

Thomas

Thomas Moore
  • 941
  • 2
  • 11
  • 17

2 Answers2

3

You can add columns to an existing xdf file with rxDataStep:

xdf1 <- RxXdfData("file1.xdf")  # dataset containing Total_Admits_Pred
xdf2 <- RxXdfData("file2.xdf")  # dataset containing Total_Admits

rxDataStep(xdf1, xdf2, varsToKeep="Total_Admits_Pred", append="cols")

This will result in file2.xdf containing all its pre-existing columns, plus Total_Admits_Pred.

Another way is to use the dplyrXdf package:

devtools::install_github("RevolutionAnalytics/dplyrXdf")

df <- data.frame(Total_Admits_Pred=xdf1$Total_Admits_Pred,
                 Total_Admits=xdf2$Total_Admits)

This creates an in-memory data frame with just the two columns you want. The advantage of this, over the other answer, is that it reads only those two columns into memory.

Hong Ooi
  • 56,353
  • 13
  • 134
  • 187
1

You would do something like this:

xdf_df1 <- rxImport("<path/to/xdf1>")
xdf_df2 <- rxImport("<path/to/xdf2>")

xdfOut <- RxXdfData("<path/to/merged/xdf>") # Should not already exist

# This assumes that xdf2 was the one containing "Total_Admits_Pred"
# and that xdf1 contained "Total_Admits", you'll have to adjust this
# based on your data.
xdf_df1[["Total_Admits_Pred"]] <- xdf_df2$Total_Admits_Pred 

# Verify the Data Frame is correct
head(xdf_df1)

# Export it
rxDataStep(inData = xdf_df1, outFile = xdfOut)
  • Just a quick question, when you write: xdf_df1[["Total_Admits_Pred"]], could you explain this syntax to me? Are you assigning Total_Admits_Pred to its own column in xdf_df1? I didn't know XDF's worked like that? Further, xdf_df2$Total_Admits_Pred, this just returns this specific column in xdf_df2? So, am I to gather then, that XDFs work exactly the same way as regular old dataframes in R, they are just stored differently? Thanks! – Thomas Moore Jun 05 '17 at 23:36
  • This answer converts the two xdf files to (in memory) data frame, then combines them. See my answer for a way to do this without reading all the data into memory. – Hong Ooi Jun 06 '17 at 04:55
  • @ThomasMoore the syntax xdf_df1[["Total_Admits_Pred"]] creates a column labelled "Total_Admits_Pred" in xdf_df1 and then appends the data from xdf_df2 into that column. Hong is write, my scriptlet reads the two XDF files into R's memory as data.frames and then does that data manipulation with standard R. – Kirill Glushko - Microsoft Jun 06 '17 at 18:18