Loop to split Large Data Frame and write multiple CSV files in R

Question

I have a file with 25 million rows and need to split it into smaller files based on the factor levels. I created a dataframe to include distinct factor levels and wrote a loop to perform some operations and write out a csv.

data looks like this:

Country Col2 Code   Year
 A       C     1    2020
 A       D     1    2020
 A       C     1    2020
 A       D     2    2020
 A       C     2    2020
 A       D     2    2020
 A       C     2    2020
 A       D     3    2020

Intention is to write a csv file for every subset based on code

d1 <- data %>%
  distinct(Code)

for(i in 1:nrow(d1))
{
  
  subset <- data %>%
  filter(Code  == Code[i])
  co <- subset$Code[i]
  
  yr<- subset$Year[i]
  

  setwd("C:/Users/...")
  
  write.csv(subset, paste(co,"_",Year, ".csv", sep=""), append = FALSE, row.names = FALSE)
  
  }

The output keeps getting written to the same file instead of creating separate files in the directory.

IS there any better way of doing this? Thank you.

Ronak Shah · Accepted Answer · 2020-07-16T09:40:11.383

1

Use split to split data based on factor levels.

df_dat <- split(df, df$Code)
lapply(df_dat, function(x) write.csv(x, paste0('df_', x$col2[1], '.csv'), row.names = FALSE))

This will create separate dataframes in your working directory.

edited Jul 16 '20 at 09:40

answered Jul 16 '20 at 08:59

Ronak Shah

377,200
20
156
213

Thanks. So it effectively starts writing separate file, but the names is an issue. In my code, co represents a country name which will be stored in Col1 or Col2. This takes names of the columns – marine8115 Jul 16 '20 at 09:08
1) How do you know then if name should be taken from `Col1` or `Col2` ? 2) Which `co` value do you want it to take? For example for `Code == 1` , `Col1` has A and B which should be written as file name? – Ronak Shah Jul 16 '20 at 09:13
Country name will always be stored in col2. Just checked – marine8115 Jul 16 '20 at 09:18
I am creating a subset in such a way that code == 1 will have just 1 country name. Will edit the sample data. Thanks – marine8115 Jul 16 '20 at 09:22
See updated answer. – Ronak Shah Jul 16 '20 at 09:40

Loop to split Large Data Frame and write multiple CSV files in R

1 Answers1

Linked

Related