How to return nrow per group in R?

Question

This is probably a very basic question... I have a simple dataframe with different observations per course. I want R to return the number of rows (in my case equal to the number of observations) per course.

So for example:

DF <- structure(list(age = c(36, 21, 20, 32, 24), course = c("AERO", 
"AERO", "CREDIT", "CREDIT", "SOLAR")), .Names = c("age", "course"), class = "data.frame", row.names = c(NA, 
-5L))

Then I want to have something like

nrow(DF, by=course)

.. to return the number of rows per course. I know that nrow(DF, by=course) does not exist, but is there anything else?

I have used subsets, but then I have to define each subset.

Rich Scriven · Accepted Answer · 2014-11-26T22:13:50.800

9

A simple table will tell you how many rows of each course exist in the data.

c(table(DF$course))
# AERO CREDIT  SOLAR 
#    2      2      1

edited Nov 26 '14 at 22:13

answered Nov 26 '14 at 22:03

Rich Scriven

97,041
11
181
245

score 8 · Answer 2 · answered Nov 27 '14 at 00:25

8

Hardly not to mention data.table these days for its speed, memory efficiency and compact syntax (though may need some time to get used to).

library(data.table)
setDT(DF)             # convert data.frame to data.table
DF[, .N, by=course]   

#    course N
# 1:   AERO 2
# 2: CREDIT 2
# 3:  SOLAR 1

answered Nov 27 '14 at 00:25

KFB

3,501
3
15
18

2

Not really necessary to set to DT though `as.data.table(DF)[,.N, by = course]` – Rich Scriven Nov 27 '14 at 00:30
@RichardScriven, Right. It's becoming a habit :) – KFB Nov 27 '14 at 00:32
1

Note that `setDT` converts the data.frame to a data.table by reference (without making copies), which is not true of `as.data.table(DF)`. Copies can be pretty expensive for large data sets. If you want to work with a data.frame afterward, there is an inverse function, `setDF` which will coerce the data.table object to a data.frame by reference. – lmo Jul 05 '17 at 12:23

score 5 · Answer 3 · answered Nov 26 '14 at 22:15

5

Just as an alternative worth knowing if you will get into R using plyr library:

library(plyr)
rows_course = ddply(DF, c("course"), summarise, nrows = length(course))

> rows_course
  course nrows
1   AERO     2
2 CREDIT     2
3  SOLAR     1

The above is worth knowing but Richard's solution is the fastest.

OR even faster (using Richard's valuable comment):

> count(DF$course)
       x freq
1   AERO    2
2 CREDIT    2
3  SOLAR    1

answered Nov 26 '14 at 22:15

LyzandeR

37,047
12
77
87

@RichardScriven Thanks a lot! I mentioned it above. I am so used to using ddply with summarise as above that I forgot about it. – LyzandeR Nov 26 '14 at 22:19

score 0 · Answer 4 · answered Feb 11 '20 at 22:38

0

A very easy way to get the number of rows per factor is this simple code

n_1 = nrow(mydata[mydata$A==1,])

answered Feb 11 '20 at 22:38

Jon Lachmann

381
1
3
10

score 0 · Answer 5 · answered Jul 18 '22 at 08:24

Using the dplyr package n() gives the current group size.

library(dplyr)
DF %>% group_by(course) %>%
  mutate(N_course = n()) %>%
  ungroup()

# A tibble: 5 x 3
    age course N_course
  <dbl> <chr>     <int>
1    36 AERO          2
2    21 AERO          2
3    20 CREDIT        2
4    32 CREDIT        2
5    24 SOLAR         1

How to return nrow per group in R?

5 Answers5

Linked