0

What I am trying to do is filter a larger data frame into 78 unique data frames based on the value of the first column in the larger data frame. The only way I can think of doing it properly is by applying the filter() function inside a for() loop:

 for (i in 1:nrow(plantline)) 
            {x1 = filter(rawdta.df, Plant_Line == plantline$Plant_Line[i])}

The issue is I don't know how to create a new data frame, say x2, x3, x4... every time the loop runs.

Can someone tell me if that is possible or if I should be trying to do this some other way?

RLave
  • 8,144
  • 3
  • 21
  • 37
Ifad Noor
  • 105
  • 1
  • 1
  • 7

6 Answers6

2

There must be many duplicates for this question

split(plantline, plantline$Plant_Line)

will create a list of data.frames.

However, depending on your use case, splitting the large data.frame into pieces might not be necessary as grouping can be used.

Uwe
  • 41,420
  • 11
  • 90
  • 134
2

You could use split -

# creates a list of dataframes into 78 unique data frames based on
# the value of the first column in the larger data frame
lst = split(large_data_frame, large_data_frame$first_column)

# takes the dataframes out of the list into the global environment
# although it is not suggested since it is difficult to work with 78 
# dataframes
list2env(lst, envir = .GlobalEnv)

The names of the dataframes will be the same as the value of the variables in the first column.

phil_t
  • 851
  • 2
  • 7
  • 17
  • 1
    Why take the data frames out of the list? Just makes them harder to work with. – Gregor Thomas Jul 25 '18 at 15:06
  • I agree, but that is what OP's `for` loop would have done - added the dataframes to the global environment. I also added it for completeness, in case at a later point someone looks up this question, and the number of unique values in the splitting column is much lower, say 3 or 4. – phil_t Jul 25 '18 at 15:09
  • I'd recommend at least *mentioning* the alternative rather than helping newbies shoot themselves in the foot because that's what they're trying to do. – Gregor Thomas Jul 25 '18 at 15:17
  • @Gregor, understood. Thank you. Does the edit make it better? – phil_t Jul 25 '18 at 15:21
  • Much improved. To anyone reading this - it's easy to use a `for` loop or `lapply` (or `Map` or many other options... see the `purrr` package) to work on each data frame in a `list`. However it is harder and bug-prone` to use `paste` and `assign` and `get` and other hacks to work with a bunch of nearly-identical data frames in your environment. I'd strongly recommend keeping them in a nice list. – Gregor Thomas Jul 25 '18 at 15:30
0

It would be easier if we could see the dataframes....

I propose something nevertheless. You can create a list of dataframes:

dataframes <- vector("list", nrow(plantline))
for (i in 1:nrow(plantline)){ 
     dataframes[[i]] = filter(rawdta.df, Plant_Line == plantline$Plant_Line[i])
}
Stéphane Laurent
  • 75,186
  • 15
  • 119
  • 225
0

You can use assign :

for (i in 1:nrow(plantline)) 
        {assign(paste0(x,i), filter(rawdta.df, Plant_Line == plantline$Plant_Line[i]))}

alternatively you can save your results in a list :

X <- list()    
for (i in 1:nrow(plantline)) 
        {X[[i]] = filter(rawdta.df, Plant_Line == plantline$Plant_Line[i])}
Thor6
  • 781
  • 6
  • 9
  • 4
    `fortunes::fortune(236)` *The only people who should use the assign function are those who fully understand why you should never use the assign function.* -- Greg Snow R-help (July 2009) – Uwe Jul 25 '18 at 15:02
  • I know not the best practice, but thanks for introducing me the fortunes package! – Thor6 Jul 25 '18 at 15:05
0

Would be easier with sample data. by would be my favorite.

d <- data.frame(plantline = rep(LETTERS[1:3], 4),
                x = 1:12, 
                stringsAsFactors = F)

l <- by(d, d$plantline, data.frame)

print(l$A)
print(l$B)
r.user.05apr
  • 5,356
  • 3
  • 22
  • 39
0

Solution using plyr:

ma <- cbind(x = 1:10, y = (-4:5)^2, z = 1:2)
ma <- as.data.frame(ma)

library(plyr)
dlply(ma, "z") # you split ma by the column named z
RLave
  • 8,144
  • 3
  • 21
  • 37