1

I have a file with 25 million rows and need to split it into smaller files based on the factor levels. I created a dataframe to include distinct factor levels and wrote a loop to perform some operations and write out a csv.

data looks like this:

Country Col2 Code   Year
 A       C     1    2020
 A       D     1    2020
 A       C     1    2020
 A       D     2    2020
 A       C     2    2020
 A       D     2    2020
 A       C     2    2020
 A       D     3    2020

Intention is to write a csv file for every subset based on code

d1 <- data %>%
  distinct(Code)
for(i in 1:nrow(d1))
{
  
  subset <- data %>%
  filter(Code  == Code[i])
  co <- subset$Code[i]
  
  yr<- subset$Year[i]
  

  setwd("C:/Users/...")
  
  write.csv(subset, paste(co,"_",Year, ".csv", sep=""), append = FALSE, row.names = FALSE)
  
  }

The output keeps getting written to the same file instead of creating separate files in the directory.

IS there any better way of doing this? Thank you.

marine8115
  • 588
  • 3
  • 22

1 Answers1

1

Use split to split data based on factor levels.

df_dat <- split(df, df$Code)
lapply(df_dat, function(x) write.csv(x, paste0('df_', x$col2[1], '.csv'), row.names = FALSE))

This will create separate dataframes in your working directory.

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Thanks. So it effectively starts writing separate file, but the names is an issue. In my code, co represents a country name which will be stored in Col1 or Col2. This takes names of the columns – marine8115 Jul 16 '20 at 09:08
  • 1) How do you know then if name should be taken from `Col1` or `Col2` ? 2) Which `co` value do you want it to take? For example for `Code == 1` , `Col1` has A and B which should be written as file name? – Ronak Shah Jul 16 '20 at 09:13
  • Country name will always be stored in col2. Just checked – marine8115 Jul 16 '20 at 09:18
  • I am creating a subset in such a way that code == 1 will have just 1 country name. Will edit the sample data. Thanks – marine8115 Jul 16 '20 at 09:22
  • See updated answer. – Ronak Shah Jul 16 '20 at 09:40