0

I'm building a process where I need to import 4 large txt data sets on a recurring basis, the files include multiple variables with codes values that need to be converted into a tangible values, I.e. title_code 01 = Manager. I've created 12 rda files to support conversion, the codes are constant but I need to refresh the import files periodically.

I have not seen any examples of dplyr or sqldf that uses rda files for this purpose, seemed like a good way to store the data like temp tables, is there a better approach?

Input Data

 id <- c(1,2,3,4)
    name <- c("John","Bill","Mary","Cindy")
    title_code <- c(1,2,3,4)
    df1 <- data.frame(id,name,title_code)   
    
    title_code <- c(1,2,3,4)
    title <- c("Manager", "Officer", "Sales", "VP")
    df2 <- data.frame(title_code,title) 

Can I substitute an rda file for df2

library(sqldf)
df3 = sqldf('select df1.id, df1.name, df2.title 
      from df1 left join df2 
            on df1.title_code = df2.title_code')
df3

Expected Outcome

enter image description here

Ricky
  • 123
  • 9
  • You can load the `rda` file as a data frame and use your existing code... – Gregor Thomas Dec 16 '20 at 15:35
  • Thanks Greg, just wasn't sure if there is a more efficient method, since its recurring and there are 12 df I would need to load and join to convert the codes the actual data sets have 121 total variables – Ricky Dec 16 '20 at 15:48
  • Loading rda files is pretty quick. You certainly want to work with in-memory data, not on-disk data, unless your data is so big that it doesn't fit in memory. My hunch is that `dplyr` will be more efficient than sqldf for the joins, but you should verify that. If speed is an issue, switching to `data.table` will almost certainly speed things up. – Gregor Thomas Dec 16 '20 at 15:57

0 Answers0