How do I get count of number of items in selection?

Question

I want to list in array format how many in each Diet group (there are four) have Time > 21.

I have tried to solve this in RStudio.

data(ChickWeight)
newdata <- subset(ChickWeight, Time >= 21, select=Diet)

In order to find how many observations are in newdata, I used nrow(newdata), but I would like to find out how many observations meet the criteria just by making it a part of this expression:

newdata <- subset(ChickWeight, Time >= 21, select=Diet)

so that when I display newdata the table will also contain the number of observations that meet the criteria in a new column.

Desire output:

Diet   Number Observations
1      200 (I just created the numbers for this column as examples)
2       75
3      150
4      100

Is there a way to do that?

and the obs count would be a repeating number in a different column of `newdata`? What about `newdata$obs_count <- nrow(newdata)`? — acylam, Jul 12 '19 at 17:13
I would like it displayed this way: Diet Number Observations 1 200 (what # is) 2 300 (what # is) 3 75 (what # is) 4 25 (what # is) avid_useR: When I ran yours, I got NULL. — Metsfan, Jul 12 '19 at 17:19
So basically you want to get the obs count for each `Diet` group? — acylam, Jul 12 '19 at 17:25

score 5 · Answer 1 · answered Jul 12 '19 at 17:29

5

It can be done in base:

transform(table(Diet=subset(ChickWeight, Time >= 21, select=Diet)))

#>   Diet Freq
#> 1    1   16
#> 2    2   10
#> 3    3   10
#> 4    4    9

answered Jul 12 '19 at 17:29

M--

25,431
8
61
93

M-M, thanks. It works. What is the purpose of "table"? Why is it needed? – Metsfan Jul 12 '19 at 17:56
@Metsfan You can read about it by running `?table()`. In short, table gives a cross tab with frequencies. I am just transforming it later to change the direction of output (run the code without `transform` to see). You should run `table` on couple more dataframes that you have to know what it does better. – M-- Jul 12 '19 at 18:01
When I ran it without transform I got this error: "Error in subset.data.frame(ChickWeight, select = Diet, weight) : 'subset' must be logical" – Metsfan Jul 12 '19 at 18:09
@Metsfan Are you assigning that to a column or something? For testing, just run that line without anything else before or after: ```table(Diet=subset(ChickWeight, Time >= 21, select=Diet))``` – M-- Jul 12 '19 at 18:12
Okay, now it worked. I noticed that it converted the rows into columns. Interesting. Thanks again. – Metsfan Jul 12 '19 at 18:17

acylam · Answer 2 · 2019-07-12T19:46:34.300

1

We can do this with summarize from dplyr:

library(dplyr)

newdata %>%
  group_by(Diet) %>%
  summarize(Num_Obs = n())

We can even combine the subset to a single dplyr workflow:

ChickWeight %>%
  filter(Time >= 21) %>%
  group_by(Diet) %>%
  summarize(Num_Obs = n())

Output:

# A tibble: 4 x 2
  Diet  Num_Obs
  <fct>   <int>
1 1          16
2 2          10
3 3          10
4 4           9

edited Jul 12 '19 at 19:46

answered Jul 12 '19 at 17:27

acylam

18,231
5
36
45

score 1 · Answer 3 · answered Jul 12 '19 at 17:29

1

Consider a straightforward aggregate after the subset call:

newdata <- subset(ChickWeight, Time >= 21, select=Diet)

aggregate(cbind(Obs=Diet) ~ Diet, newdata, FUN=length)

#   Diet Obs
# 1    1  16
# 2    2  10
# 3    3  10
# 4    4   9

answered Jul 12 '19 at 17:29

Parfait

104,375
17
94
125

score 0 · Answer 4 · answered Jul 12 '19 at 19:37

0

Here is a data table approach

library(data.table)
df <- as.data.table(ChickWeight)

df[Time >= 21, .(Number = .N), by = Diet]
#    Diet Number
# 1:    1     16
# 2:    2     10
# 3:    3     10
# 4:    4      9

answered Jul 12 '19 at 19:37

IceCreamToucan

28,083
2
22
38

How do I get count of number of items in selection?

4 Answers4