Quickly split a dataframe by year in R

Question

I have a panel that looks like this

country <- c("A","B","C","A","B","C","A","B","C")
industry<- c("X","Y","Z","X","Y","Z","X","Y","Z")
x2006<- sample(1000:100000,9)
x2007<- sample(1000:100000,9)
x2008<- sample(1000:100000,9)
dat <- data.frame (country,industry,x2006,x2007,x2008)

I am doing something very simple like

dat2006 <- dat%>%
    select(country,industry,x2006)

then using write.csv to save it as its own file

What is the best way to do this if I wanted to repeat that and save a separate file for each year (i.e. column) in the data set?

jpsmith · Accepted Answer · 2023-06-19T03:11:48.277

2

You could use sapply:

sapply(grep("x", names(dat)), function(y)
       write.csv(dat[, c(1, 2, y)], 
                 paste0(names(dat[y]), ".csv"),
                 row.names = FALSE)
)

grep finds the columns with x, sapply loops through them. This will name your csv file the column name selected and save it in the working directory.

Note, you could specify the columns in other ways too. A few alternatives:

# using column locations (numbers) directly
sapply(3:5, function(y)
  write.csv(dat[, c(1, 2, y)], 
            paste0(names(dat[y]), ".csv"),
            row.names = FALSE)
)

# using column names
sapply(c("x2006", "x2007", "x2008"), function(y)
       write.csv(dat[, c("country", "industry", y)], 
                 paste0(names(dat[y]), ".csv"),
                 row.names = FALSE))

edited Jun 19 '23 at 03:11

answered Jun 19 '23 at 03:06

jpsmith

11,023
5
15
36

This is really useful thanks, and thank you especially for including the different ways to specify the columns, not only useful but makes it even easier to understand! Can you please explain to me how to use the "function (y)" ? – Gilrob Jun 19 '23 at 04:27
1

The `function(y)` defines the function to iterate through the first part of the statement (ie, `grep(...)`, `3:5`, etc). You can think of `sapply` as a convenient way of writing a `for` loop, and instead of `for (y in ...)` you define a function. Hope that wasn't too confusing :) – jpsmith Jun 19 '23 at 13:36
haha I think that I get it! Thanks for your help :) – Gilrob Jun 20 '23 at 01:31

score 1 · Answer 2 · answered Jun 19 '23 at 04:19

Here is a tidyverse solution:

library(tidyverse)

dat %>%
  pivot_longer(cols = starts_with("x"), names_to = "year", values_to = "value") %>%
  split(.$year) %>%
  map(~ select(.x, country, industry, value)) %>%
  map2(names(.), ~ write_csv(.x, file = paste0("dat_", .y, ".csv")))

$x2006
# A tibble: 9 × 3
  country industry value
  <chr>   <chr>    <int>
1 A       X        99954
2 B       Y        27955
3 C       Z        36009
4 A       X         3061
5 B       Y        25612
6 C       Z        67307
7 A       X        96514
8 B       Y        97864
9 C       Z        43014

$x2007
# A tibble: 9 × 3
  country industry value
  <chr>   <chr>    <int>
1 A       X        83954
2 B       Y        96141
3 C       Z        62389
4 A       X        28568
5 B       Y        77503
6 C       Z        70458
7 A       X        34978
8 B       Y        35408
9 C       Z        68731

$x2008
# A tibble: 9 × 3
  country industry value
  <chr>   <chr>    <int>
1 A       X        29498
2 B       Y        62203
3 C       Z        98125
4 A       X        99549
5 B       Y        56839
6 C       Z        21621
7 A       X        84214
8 B       Y        85778
9 C       Z        90275

Quickly split a dataframe by year in R

2 Answers2