0

I am working with a set of documents and need to automate a groupby and subset into new files. I can accomplish this manually but have over 200 documents with approx 45,000 observations in each that result in over 1,000 documents. My idea was for each instance of a unique value within my df to save all values with that unique name to a df with that name. In the following example I would now have 3 dataframes called : ferrari, ford, and audi.

value <-  c(1:10)
name <-  c("ferrari","ferrari","ferrari","ford","ford","ford","ford","audi","audi","audi")
data <- data.frame(value,name)
uniques <- unique(data$name)

for(file in uniques){
  file <- subset(data, data$Name == file)
}

This just results in a df called file with 0 observations. I have also tried the normal for i in length df. Any help is greatly appreciated - I am definitely not used to writing for loops.

Michael Cantrall
  • 313
  • 3
  • 15
  • 2
    I think you just want `split(x = data,f = data$name)`. Broadly speaking, creating lots of separate but related objects with different names is bad practice in R. Much better to keep them all in a single list. You can change the names of the list elements with `names()`. – joran Oct 12 '18 at 21:42
  • Could you elaborate what your expected result would be? I don't really follow. – Roman Oct 12 '18 at 21:45
  • @joran I have to save each df as an excel file that then gets sent to different groups of people. In general you are 100% right though. – Michael Cantrall Oct 12 '18 at 21:45
  • 1
    @Roman the result would be three dataframes: 1 named ferrari with only values from ferrari, 1 named for with only values from ford, 1 named audi with only values from audi – Michael Cantrall Oct 12 '18 at 21:47
  • 3
    My advice still holds. It's simple to loop through the single list created by `split` and save each element as an excel files. Even better, you can loop through the names, select each element by name and use the name in the resulting file. Much easier. – joran Oct 12 '18 at 21:47
  • 1
    @MichaelCantrall Thank you for the clarification! Now I understand. Your approach should work as well. Your loop should have iterated through `uniques[i]` to get all instances of `uniques`. – Roman Oct 12 '18 at 22:03

2 Answers2

4

You just need split:

> split(x = data,f = data$name)
$audi
   value name
8      8 audi
9      9 audi
10    10 audi

$ferrari
  value    name
1     1 ferrari
2     2 ferrari
3     3 ferrari

$ford
  value name
4     4 ford
5     5 ford
6     6 ford
7     7 ford

which results in three data frames named audi, ferrari and ford, as you requested. Anything further you wish to do to each subset can be done in a simple for loop across this list (or with lapply or tools from purrr if you'd like to be more fancy).

joran
  • 169,992
  • 32
  • 429
  • 468
1

joran's solution is elegant, but it's also possible with an approach as yours. Easy for-loop one-liner that saves all subsets into Excel files.

> audi.xlsx
   value name
8      8 audi
9      9 audi
10    10 audi

> ferrari.xlsx
  value    name
1     1 ferrari
2     2 ferrari
3     3 ferrari

> ford.xlsx
  value name
4     4 ford
5     5 ford
6     6 ford
7     7 ford

Code

for(i in 1:length(uniques)){
    write.xlsx(subset(data, data$name == uniques[i]), paste0(uniques[i], ".xlsx"))
}

Data

library(xlsx)
data <- data.frame(value = c(1:10),
                   name = c(rep("ferrari",3), rep("ford",4), rep("audi",3)))
uniques <- unique(as.character(data$name))
Roman
  • 4,744
  • 2
  • 16
  • 58