1

I have around 20-30 dbf files, which I imported in R. I cannot combine them together in one data frame/table because then the total file size comes around 2 GB. I want to create new columns in each file "avg_spends" grouping by age and ctg multiple columns in each of them.

When i combined the files into one data table and then executed the following command using dplyr.

file_combo <- dbf_file %>% group_by(ctg, age) %>% mutate(avg_spends = 
mean(total_spend)

This is just the first step. Similarly I have to make new columns based on the previous columns available/created. How do i make this work by splitting the files by the 1st col- files1, files,2 etc.

I also need an output for each file separately

This is an example of the data that I have

files ||   age || ctg || total_spend
==================================
file1 ||    45 ||   1 ||    1026


file1 ||    26 ||   2 ||    1574


file1 ||    45 ||   1 ||    64


file1 ||    32 ||   1 ||    1610


file2 ||    41 ||   1 ||    884


file2 ||    22 ||   1 ||    530


file2 ||    41 ||   2 ||    451


file2 ||    22 ||   1 ||    520


file3 ||    21 ||   2 ||    727


file3 ||    34 ||   1 ||    562


file3 ||    43 ||   2 ||    452


file3 ||    23 ||   1 ||    851
Frank
  • 66,179
  • 8
  • 96
  • 180
zd06
  • 25
  • 3

1 Answers1

0

You can achieve this by storing all of your files in a list and performing the action on the entire list with lapply(), like so:

file1 <- data.frame(age = c(45,26,45,32), ctg = c(1,2,1,1), total_spend = c(1026, 1574, 64, 1610))
file2 <- data.frame(age = c(41,22,41,22), ctg = c(1,1,2,1), total_spend = c(884, 530, 451, 520))
file3 <- data.frame(age = c(21,34,43,23), ctg = c(2,1,2,1), total_spend = c(727, 562, 452, 851))

files <- list(file1, file2, file3)

result <- lapply(files, function(x) x %>% group_by(ctg, age) %>% mutate(avg_spends = mean(total_spend)))
93i7hdjb
  • 1,136
  • 1
  • 9
  • 15
  • Thank u Erik. I tried the code that you helped with... It throws this error... Error in UseMethod("group_by_") : no applicable method for 'group_by_' applied to an object of class "character" is there anything I'm missing out before runing these lines? – zd06 Mar 09 '18 at 18:07
  • hhmm, i was making an edit, perhaps you grabbed the code while I was adjusting. Pls try again. – 93i7hdjb Mar 09 '18 at 18:09
  • yes thank u ! that works when i create a data frame by writing a separate line for each file :) ....What should i do in a case where i have too many files and cannot write down each filename.. I tried the same lapply() code using this: but it doesn't give me a result. filenames <- list.files(path="filepath", pattern="practice_.*csv") names <-substr(filenames,1,9) result <- lapply(filenames, function(x) x %>% group_by(category, age) %>% mutate(avg_spends = mean(total_spend))) – zd06 Mar 09 '18 at 18:29
  • Define files like this: `files <- lapply(filenames, function(x) read.csv(x))` then you can run the function on `files`: `result <- lapply(files, function(x) x %>% group_by(category, age) %>% mutate(avg_spends = mean(total_spend)))` – 93i7hdjb Mar 09 '18 at 18:36
  • That helped !! Thank u so much :) – zd06 Mar 09 '18 at 18:46