Creating Subset data frames in R within For loop

Question

What I am trying to do is filter a larger data frame into 78 unique data frames based on the value of the first column in the larger data frame. The only way I can think of doing it properly is by applying the filter() function inside a for() loop:

 for (i in 1:nrow(plantline)) 
            {x1 = filter(rawdta.df, Plant_Line == plantline$Plant_Line[i])}

The issue is I don't know how to create a new data frame, say x2, x3, x4... every time the loop runs.

Can someone tell me if that is possible or if I should be trying to do this some other way?

Can you show us an example of `plantline` please? – RLave Jul 25 '18 at 14:55 — RLave, Jul 25 '18 at 14:55

score 2 · Answer 1 · answered Jul 25 '18 at 14:57

2

There must be many duplicates for this question

split(plantline, plantline$Plant_Line)

will create a list of data.frames.

However, depending on your use case, splitting the large data.frame into pieces might not be necessary as grouping can be used.

answered Jul 25 '18 at 14:57

Uwe

41,420
11
90
134

phil_t · Answer 2 · 2018-07-25T15:20:55.750

2

You could use split -

# creates a list of dataframes into 78 unique data frames based on
# the value of the first column in the larger data frame
lst = split(large_data_frame, large_data_frame$first_column)

# takes the dataframes out of the list into the global environment
# although it is not suggested since it is difficult to work with 78 
# dataframes
list2env(lst, envir = .GlobalEnv)

The names of the dataframes will be the same as the value of the variables in the first column.

edited Jul 25 '18 at 15:20

answered Jul 25 '18 at 15:01

phil_t

851
2
7
17

1

Why take the data frames out of the list? Just makes them harder to work with. – Gregor Thomas Jul 25 '18 at 15:06
I agree, but that is what OP's `for` loop would have done - added the dataframes to the global environment. I also added it for completeness, in case at a later point someone looks up this question, and the number of unique values in the splitting column is much lower, say 3 or 4. – phil_t Jul 25 '18 at 15:09
I'd recommend at least *mentioning* the alternative rather than helping newbies shoot themselves in the foot because that's what they're trying to do. – Gregor Thomas Jul 25 '18 at 15:17
@Gregor, understood. Thank you. Does the edit make it better? – phil_t Jul 25 '18 at 15:21
Much improved. To anyone reading this - it's easy to use a `for` loop or `lapply` (or `Map` or many other options... see the `purrr` package) to work on each data frame in a `list`. However it is harder and bug-prone` to use `paste` and `assign` and `get` and other hacks to work with a bunch of nearly-identical data frames in your environment. I'd strongly recommend keeping them in a nice list. – Gregor Thomas Jul 25 '18 at 15:30

score 0 · Answer 3 · answered Jul 25 '18 at 14:53

0

It would be easier if we could see the dataframes....

I propose something nevertheless. You can create a list of dataframes:

dataframes <- vector("list", nrow(plantline))
for (i in 1:nrow(plantline)){ 
     dataframes[[i]] = filter(rawdta.df, Plant_Line == plantline$Plant_Line[i])
}

answered Jul 25 '18 at 14:53

Stéphane Laurent

75,186
15
119
225

Thanks! This is exactly what I needed! – Ifad Noor Jul 25 '18 at 15:56

score 0 · Answer 4 · answered Jul 25 '18 at 14:54

0

You can use assign :

for (i in 1:nrow(plantline)) 
        {assign(paste0(x,i), filter(rawdta.df, Plant_Line == plantline$Plant_Line[i]))}

alternatively you can save your results in a list :

X <- list()    
for (i in 1:nrow(plantline)) 
        {X[[i]] = filter(rawdta.df, Plant_Line == plantline$Plant_Line[i])}

answered Jul 25 '18 at 14:54

Thor6

781
6
9

4

`fortunes::fortune(236)` *The only people who should use the assign function are those who fully understand why you should never use the assign function.* -- Greg Snow R-help (July 2009) – Uwe Jul 25 '18 at 15:02
I know not the best practice, but thanks for introducing me the fortunes package! – Thor6 Jul 25 '18 at 15:05

score 0 · Answer 5 · answered Jul 25 '18 at 14:58

0

Would be easier with sample data. by would be my favorite.

d <- data.frame(plantline = rep(LETTERS[1:3], 4),
                x = 1:12, 
                stringsAsFactors = F)

l <- by(d, d$plantline, data.frame)

print(l$A)
print(l$B)

answered Jul 25 '18 at 14:58

r.user.05apr

5,356
3
22
39

score 0 · Answer 6 · answered Jul 25 '18 at 15:00

0

Solution using plyr:

ma <- cbind(x = 1:10, y = (-4:5)^2, z = 1:2)
ma <- as.data.frame(ma)

library(plyr)
dlply(ma, "z") # you split ma by the column named z

answered Jul 25 '18 at 15:00

RLave

8,144
3
21
37

Creating Subset data frames in R within For loop

6 Answers6