Calculate mean and median for a frequency table per column (length class per group)

Question

I have a frequency table of length classes of fish per location:

LK   Loc1  Loc2  Loc3    
1     13   22     0          
2     20   18     4          
3     12   21     2          
4     2     0     1          
5     1     2     0

I would like to calculate the mean and median value for each column (location) separately. For instance: Loc1: mean = (13 x 1)+(20 x 2)+(3 x 12)+(2 x 4)+(5 x 1)= 2.1 LK for Location 1.

I got really stuck on this and I don't know where to start. Is there a way to calculate this automatically for each column? Thank you in advance.

GKi · Answer 1 · 2021-01-27T14:14:09.773

You can use weighted.mean to get the mean

sapply(x[-1], weighted.mean, x=x[,1])
#    Loc1     Loc2     Loc3 
#2.125000 2.079365 2.571429

or using proportions

colSums(proportions(as.matrix(x[-1]), 2) * x[,1])
#    Loc1     Loc2     Loc3 
#2.125000 2.079365 2.571429

and rep for the median.

sapply(x[-1], function(y) median(rep(x[,1], y)))
#Loc1 Loc2 Loc3 
#   2    2    2

Data:

x <- read.table(header=TRUE, text="LK   Loc1  Loc2  Loc3    
1     13   22     0          
2     20   18     4          
3     12   21     2          
4     2     0     1          
5     1     2     0")

user2974951 · Accepted Answer · 2021-01-27T12:39:17.083

1

Assuming your data is a data.frame df, for the mean

sapply(subset(df,select=-c(LK)),function(x){mean(x*df$LK)})

for the mean and median

sapply(subset(df,select=-c(LK)),function(x){c(mean(x*df$LK),median(x*df$LK))})

but perhaps you are searching for a weighted average of LK, each column containing the weights, in which case

sapply(subset(df,select=-c(LK)),function(x){weighted.mean(df$LK,x)})

edited Jan 27 '21 at 12:39

answered Jan 27 '21 at 12:27

user2974951

9,535
1
17
24

Thank you! What if the dataset also has NA values? My actual data set is way bigger and with way more locations with also having no values. Is there a way to overcome this? – Suusie Jan 27 '21 at 12:41
@Suusie How you deal with missing values is a design choice, you need domain knowledge. The default way would be to use `weighted.mean(df$LK,x,na.rm=T)` which drops all missing values. – user2974951 Jan 27 '21 at 12:44
When I try this: test2 <- sapply(subset(LK20063,select=-c(LK)),function(x){weighted.mean(LK20063$LK,x,na.rm=T)}) I stil get an NA values for the calculated weighted mean. – Suusie Jan 27 '21 at 12:48
I found it! Thank you so much for your help! – Suusie Jan 27 '21 at 12:51

score 1 · Answer 3 · answered Jan 27 '21 at 12:50

Here is a tidyverse solution.

library(dplyr)
library(tidyr)

df1 %>%
  pivot_longer(-LK, names_to = "Loc") %>%
  group_by(Loc) %>%
  summarise(mean = mean(LK*value, na.rm = TRUE),
            median = median(LK*value, na.rm = TRUE),
            .groups = "drop")
## A tibble: 3 x 3
#  Loc    mean median
#  <chr> <dbl>  <int>
#1 Loc1   20.4     13
#2 Loc2   26.2     22
#3 Loc3    3.6      4

Data

df1 <- read.table(text = "
LK   Loc1  Loc2  Loc3    
1     13   22     0          
2     20   18     4          
3     12   21     2          
4     2     0     1
5     1     2     0
", header = TRUE)

That also works very wel, and what if I want to have the weighted mean in stead of the mean? — Suusie, Jan 27 '21 at 12:55

score 1 · Answer 4 · answered Jan 27 '21 at 13:05

1

Multiply first column to all remaining columns, then use colMeans:

colMeans(df1$LK * df1[ -1 ])
# Loc1 Loc2 Loc3 
# 20.4 26.2  3.6

answered Jan 27 '21 at 13:05

zx8754

52,746
12
114
209

Calculate mean and median for a frequency table per column (length class per group)

4 Answers4