3

This is probably a very basic question... I have a simple dataframe with different observations per course. I want R to return the number of rows (in my case equal to the number of observations) per course.

So for example:

DF <- structure(list(age = c(36, 21, 20, 32, 24), course = c("AERO", 
"AERO", "CREDIT", "CREDIT", "SOLAR")), .Names = c("age", "course"), class = "data.frame", row.names = c(NA, 
-5L))

Then I want to have something like

nrow(DF, by=course)

.. to return the number of rows per course. I know that nrow(DF, by=course) does not exist, but is there anything else?

I have used subsets, but then I have to define each subset.

Thieme Hennis
  • 565
  • 2
  • 9
  • 20

5 Answers5

9

A simple table will tell you how many rows of each course exist in the data.

c(table(DF$course))
# AERO CREDIT  SOLAR 
#    2      2      1 
Rich Scriven
  • 97,041
  • 11
  • 181
  • 245
8

Hardly not to mention data.table these days for its speed, memory efficiency and compact syntax (though may need some time to get used to).

library(data.table)
setDT(DF)             # convert data.frame to data.table
DF[, .N, by=course]   

#    course N
# 1:   AERO 2
# 2: CREDIT 2
# 3:  SOLAR 1
KFB
  • 3,501
  • 3
  • 15
  • 18
  • 2
    Not really necessary to set to DT though `as.data.table(DF)[,.N, by = course]` – Rich Scriven Nov 27 '14 at 00:30
  • @RichardScriven, Right. It's becoming a habit :) – KFB Nov 27 '14 at 00:32
  • 1
    Note that `setDT` converts the data.frame to a data.table by reference (without making copies), which is not true of `as.data.table(DF)`. Copies can be pretty expensive for large data sets. If you want to work with a data.frame afterward, there is an inverse function, `setDF` which will coerce the data.table object to a data.frame by reference. – lmo Jul 05 '17 at 12:23
5

Just as an alternative worth knowing if you will get into R using plyr library:

library(plyr)
rows_course = ddply(DF, c("course"), summarise, nrows = length(course))

> rows_course
  course nrows
1   AERO     2
2 CREDIT     2
3  SOLAR     1

The above is worth knowing but Richard's solution is the fastest.

OR even faster (using Richard's valuable comment):

> count(DF$course)
       x freq
1   AERO    2
2 CREDIT    2
3  SOLAR    1
LyzandeR
  • 37,047
  • 12
  • 77
  • 87
  • @RichardScriven Thanks a lot! I mentioned it above. I am so used to using ddply with summarise as above that I forgot about it. – LyzandeR Nov 26 '14 at 22:19
0

A very easy way to get the number of rows per factor is this simple code

n_1 = nrow(mydata[mydata$A==1,])
Jon Lachmann
  • 381
  • 1
  • 3
  • 10
0

Using the dplyr package n() gives the current group size.

library(dplyr)
DF %>% group_by(course) %>%
  mutate(N_course = n()) %>%
  ungroup()

# A tibble: 5 x 3
    age course N_course
  <dbl> <chr>     <int>
1    36 AERO          2
2    21 AERO          2
3    20 CREDIT        2
4    32 CREDIT        2
5    24 SOLAR         1