0

i am trying to find the mode of multiple answers corresponding to the same text. i have a dataframe with 2 columns, userIDs and their answers. I currently have multiple entries per userID for each of their answers and i don't know how to combine or group those answers together to find the mode or how to use the find_mode function. it currently looks like the table below, the userIDs have multiple entries across rows and are not in order. this table is on a much smaller scale as I am dealing with roughly 100 userIDs with anywhere from 4-25 answers (all answers are between 1-6 if that makes a difference). I do not know how to create a reprex so have tried my best to show you without using screenshots!

DF currently appears as:

|  UserID  |  Answer  |
| -------- | -------- |
| ID1      |      2   |
| ID3      |      4   |
| ID1      |      5   |
| ID2      |      4   |
| ID2      |      1   |
| ID3      |      3   |
| ID1      |      3   |
| ID2      |      1   |
| ID3      |      4   |

i have managed to accomplish what i wanted using the mean function but obviously i don't actually want to find the mean i want to find the mode but i don't know how to do so. the code i used to find the mean is below which might hopefully help give an idea of what i am aiming for the outcome to be - the code has been altered to make sense within these example tables

answers_mean <- aggregate(DF[, 3], list(UserID=DF$UserID), mean)

ideally the table from above would look like this after finding the mode.

| UserID  |  AnsMod  |
| -------- | -------- |
| ID1      |      2   |
| ID2      |      1   |
| ID3      |      4   |

so far i think i may need to group the data by userID and summarise to find the mode under summarise func find_mode (not sure how to do this) or pivot the data wider to then have each userID listed once and have all corresponding answers on the same row and then find the mode of each row (again unsure how to do this) any help for example code or advice on how to achieve this would be greatly appreciated!

Phil
  • 7,287
  • 3
  • 36
  • 66
pillypum
  • 5
  • 2

2 Answers2

0

A mode function

Mode <- function(x) {
   val <- unique(x)
   val[which.max(tabulate(match(x, val)))]
}

Using aggregate

aggregate(Answer ~ UserID, df, Mode)
  UserID Answer
1    ID1      2
2    ID2      1
3    ID3      4

Data

df <- structure(list(UserID = c("ID1", "ID3", "ID1", "ID2", "ID2", 
"ID3", "ID1", "ID2", "ID3"), Answer = c(2L, 4L, 5L, 4L, 1L, 3L, 
3L, 1L, 4L)), class = "data.frame", row.names = c(NA, -9L))
Andre Wildberg
  • 12,344
  • 3
  • 12
  • 29
0

Use dplyr functions to get Answer counts by UserID, then take the highest count per ID:

library(dplyr)

df |> 
  count(UserID, Answer) |> 
  group_by(UserID) |> 
  slice_max(order_by = n, with_ties = FALSE)

Output:

  UserID Answer     n
  <chr>   <int> <int>
1 ID1         2     1
2 ID2         1     2
3 ID3         4     2

Note: Use with_ties = TRUE to return ties for the modal value, as is the case with ID1 here.

andrew_reece
  • 20,390
  • 3
  • 33
  • 58