6

I want to list in array format how many in each Diet group (there are four) have Time > 21.

I have tried to solve this in RStudio.

data(ChickWeight)
newdata <- subset(ChickWeight, Time >= 21, select=Diet)

In order to find how many observations are in newdata, I used nrow(newdata), but I would like to find out how many observations meet the criteria just by making it a part of this expression:

newdata <- subset(ChickWeight, Time >= 21, select=Diet) 

so that when I display newdata the table will also contain the number of observations that meet the criteria in a new column.

Desire output:

Diet   Number Observations
1      200 (I just created the numbers for this column as examples)
2       75
3      150
4      100 

Is there a way to do that?

M--
  • 25,431
  • 8
  • 61
  • 93
Metsfan
  • 510
  • 2
  • 8
  • and the obs count would be a repeating number in a different column of `newdata`? What about `newdata$obs_count <- nrow(newdata)`? – acylam Jul 12 '19 at 17:13
  • I would like it displayed this way: Diet Number Observations 1 200 (what # is) 2 300 (what # is) 3 75 (what # is) 4 25 (what # is) avid_useR: When I ran yours, I got NULL. – Metsfan Jul 12 '19 at 17:19
  • Please post your desired output in the question body itself – acylam Jul 12 '19 at 17:20
  • So basically you want to get the obs count for each `Diet` group? – acylam Jul 12 '19 at 17:25

4 Answers4

5

It can be done in base:

transform(table(Diet=subset(ChickWeight, Time >= 21, select=Diet)))

#>   Diet Freq
#> 1    1   16
#> 2    2   10
#> 3    3   10
#> 4    4    9
M--
  • 25,431
  • 8
  • 61
  • 93
  • M-M, thanks. It works. What is the purpose of "table"? Why is it needed? – Metsfan Jul 12 '19 at 17:56
  • @Metsfan You can read about it by running `?table()`. In short, table gives a cross tab with frequencies. I am just transforming it later to change the direction of output (run the code without `transform` to see). You should run `table` on couple more dataframes that you have to know what it does better. – M-- Jul 12 '19 at 18:01
  • When I ran it without transform I got this error: "Error in subset.data.frame(ChickWeight, select = Diet, weight) : 'subset' must be logical" – Metsfan Jul 12 '19 at 18:09
  • @Metsfan Are you assigning that to a column or something? For testing, just run that line without anything else before or after: ```table(Diet=subset(ChickWeight, Time >= 21, select=Diet))``` – M-- Jul 12 '19 at 18:12
  • Okay, now it worked. I noticed that it converted the rows into columns. Interesting. Thanks again. – Metsfan Jul 12 '19 at 18:17
1

We can do this with summarize from dplyr:

library(dplyr)

newdata %>%
  group_by(Diet) %>%
  summarize(Num_Obs = n())

We can even combine the subset to a single dplyr workflow:

ChickWeight %>%
  filter(Time >= 21) %>%
  group_by(Diet) %>%
  summarize(Num_Obs = n())

Output:

# A tibble: 4 x 2
  Diet  Num_Obs
  <fct>   <int>
1 1          16
2 2          10
3 3          10
4 4           9
acylam
  • 18,231
  • 5
  • 36
  • 45
1

Consider a straightforward aggregate after the subset call:

newdata <- subset(ChickWeight, Time >= 21, select=Diet)

aggregate(cbind(Obs=Diet) ~ Diet, newdata, FUN=length)

#   Diet Obs
# 1    1  16
# 2    2  10
# 3    3  10
# 4    4   9
Parfait
  • 104,375
  • 17
  • 94
  • 125
0

Here is a data table approach

library(data.table)
df <- as.data.table(ChickWeight)

df[Time >= 21, .(Number = .N), by = Diet]
#    Diet Number
# 1:    1     16
# 2:    2     10
# 3:    3     10
# 4:    4      9
IceCreamToucan
  • 28,083
  • 2
  • 22
  • 38