1

I have a frequency table of length classes of fish per location:

LK   Loc1  Loc2  Loc3    
1     13   22     0          
2     20   18     4          
3     12   21     2          
4     2     0     1          
5     1     2     0        

I would like to calculate the mean and median value for each column (location) separately. For instance: Loc1: mean = (13 x 1)+(20 x 2)+(3 x 12)+(2 x 4)+(5 x 1)= 2.1 LK for Location 1.

I got really stuck on this and I don't know where to start. Is there a way to calculate this automatically for each column? Thank you in advance.

Suusie
  • 149
  • 9

4 Answers4

2

You can use weighted.mean to get the mean

sapply(x[-1], weighted.mean, x=x[,1])
#    Loc1     Loc2     Loc3 
#2.125000 2.079365 2.571429 

or using proportions

colSums(proportions(as.matrix(x[-1]), 2) * x[,1])
#    Loc1     Loc2     Loc3 
#2.125000 2.079365 2.571429 

and rep for the median.

sapply(x[-1], function(y) median(rep(x[,1], y)))
#Loc1 Loc2 Loc3 
#   2    2    2 

Data:

x <- read.table(header=TRUE, text="LK   Loc1  Loc2  Loc3    
1     13   22     0          
2     20   18     4          
3     12   21     2          
4     2     0     1          
5     1     2     0")
GKi
  • 37,245
  • 2
  • 26
  • 48
1

Assuming your data is a data.frame df, for the mean

sapply(subset(df,select=-c(LK)),function(x){mean(x*df$LK)})

for the mean and median

sapply(subset(df,select=-c(LK)),function(x){c(mean(x*df$LK),median(x*df$LK))})

but perhaps you are searching for a weighted average of LK, each column containing the weights, in which case

sapply(subset(df,select=-c(LK)),function(x){weighted.mean(df$LK,x)})

user2974951
  • 9,535
  • 1
  • 17
  • 24
  • Thank you! What if the dataset also has NA values? My actual data set is way bigger and with way more locations with also having no values. Is there a way to overcome this? – Suusie Jan 27 '21 at 12:41
  • @Suusie How you deal with missing values is a design choice, you need domain knowledge. The default way would be to use `weighted.mean(df$LK,x,na.rm=T)` which drops all missing values. – user2974951 Jan 27 '21 at 12:44
  • When I try this: test2 <- sapply(subset(LK20063,select=-c(LK)),function(x){weighted.mean(LK20063$LK,x,na.rm=T)}) I stil get an NA values for the calculated weighted mean. – Suusie Jan 27 '21 at 12:48
  • I found it! Thank you so much for your help! – Suusie Jan 27 '21 at 12:51
1

Here is a tidyverse solution.

library(dplyr)
library(tidyr)

df1 %>%
  pivot_longer(-LK, names_to = "Loc") %>%
  group_by(Loc) %>%
  summarise(mean = mean(LK*value, na.rm = TRUE),
            median = median(LK*value, na.rm = TRUE),
            .groups = "drop")
## A tibble: 3 x 3
#  Loc    mean median
#  <chr> <dbl>  <int>
#1 Loc1   20.4     13
#2 Loc2   26.2     22
#3 Loc3    3.6      4

Data

df1 <- read.table(text = "
LK   Loc1  Loc2  Loc3    
1     13   22     0          
2     20   18     4          
3     12   21     2          
4     2     0     1
5     1     2     0
", header = TRUE)
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
  • That also works very wel, and what if I want to have the weighted mean in stead of the mean? – Suusie Jan 27 '21 at 12:55
1

Multiply first column to all remaining columns, then use colMeans:

colMeans(df1$LK * df1[ -1 ])
# Loc1 Loc2 Loc3 
# 20.4 26.2  3.6
zx8754
  • 52,746
  • 12
  • 114
  • 209