-1

I am trying to summarize a dataset.

I am looking to produce a table with counts and averages all in one.

Example data:

df <- data.frame(
    "Species" = c("A","B","C","D","A","B","C","D"), 
    "Location" =  c("A","B","C","B","A","D","D","E"), 
    "Sample size" = c(1,30,6,2,5,10,3,300), 
    "Frequency"=c(0,0.3,80,0.5,0.01,0.6,1,2)
  )

df

Data produces a table like this:

     Species Country Sample.size Frequency
1       A       A           1         0
2       B       B          30       0.3
3       C       C           6        80
4       D       B           2       0.5
5       A       A           5      0.01
6       B       D          10       0.6
7       C       D           3         1
8       D       E         300         2

I am trying to make a table with a column for: species, a count for the number of times a species occurs, a count for the number of countries a species occurs in, an average for sample size per species, and an average frequency per species.

Essentially, I am trying to get a table like this:

Species species_count #_of_Countries Avg_Sample.size Avg_Frequency
A       2             2              10              0
B       2             3              3               0.01
C       3             4              1               20
D       5             1              5               0.5

I am relatively new to R, so any help would be appreciated!

cgxytf
  • 421
  • 4
  • 11
  • Is the expected output based on the input example as I couldn't get the count correctly – akrun Nov 15 '19 at 18:46
  • No its not the expected output based on the sample, I just put random numbers in. But it's the general format – cgxytf Nov 15 '19 at 18:48
  • To get the mean, the 'Sample.size', 'Frequency' should be numeric – akrun Nov 15 '19 at 18:50
  • Oh right, sorry! I edited the example to be numeric. – cgxytf Nov 15 '19 at 18:52
  • 1
    Perhaps `df %>% group_by(Species) %>% mutate(Avg_Sample_size = mean(Sample.size), Avg_Frequency = mean(Frequency), species_count = n()) %>% group_by(Location) %>% mutate(NumberOfCountries = n()) %>% distinct`. As the expected is not correct, I couldn't crosscheck – akrun Nov 15 '19 at 18:54

1 Answers1

1

I think this is what you want library(dplyr)

Summary_df <- df %>% 
group_by(species) %>%
summarize(species_count = n(), 
                 country_count = sum(!is.na(Country)), 
             Avg_sample_size = mean(Sample.size), 
             Avg_frequency = mean(Frequency))
Brody
  • 24
  • 3
  • Thank you! This worked (sort of). The table produced data for species count, avg sample size and avg frequency, but the country count did not work. It gave the same results as species count, instead of counting countries – cgxytf Nov 16 '19 at 13:40