4

I was looking at this example code below,

r element frequency and column name

and was wondering if there is any way to show the index of each element in each column, in addition to the rank and frequency in r. so for example, the desired input and output would be

df <- read.table(header=T, text='A    B    C    D
a    a    b    c
b    c    x    e
c    d    y    a
d   NA    NA     z
e  NA NA NA
f NA NA NA',stringsAsFactors=F) 

and output

   element frequency columns ranking   A   B   C   D
1        a         3   A,B,D       1   1   1   na  2
3        c         3   A,B,D       1   3   2   na  1
2        b         2     A,C       2   2   na  1   na
4        d         2     A,B       2   4   3   na  na
5        e         2     A,D       2   5   na  na  2
6        f         1       A       3   6   na  na  na
8        x         1       C       3   na  na  2   na
9        y         1       C       3   na  na  3   na
10       z         1       D       3   na  na  na  3

Thank you.

Community
  • 1
  • 1
stdt1
  • 41
  • 1

1 Answers1

2

Perhaps there is a way to do this in one step, but it's not coming to mind at the moment. So, continuing with my previous answer:

library(dplyr)
library(tidyr)

step1 <- df %>%
  gather(var, val, everything()) %>%             ## Make a long dataset
  na.omit %>%                                    ## We don't need the NA values
  group_by(val) %>%                              ## All calculations grouped by val
  summarise(column = toString(var),              ## This collapses
            freq = n()) %>%                      ## This counts
  mutate(ranking = dense_rank(desc(freq)))       ## This ranks 

step2 <- df %>%
  mutate(ind = 1:nrow(df)) %>%                   ## Add an indicator column
  gather(var, val, -ind) %>%                     ## Go long
  na.omit %>%                                    ## Remove NA
  spread(var, ind)                               ## Go wide

inner_join(step1, step2)
# Joining by: "val"
# Source: local data frame [9 x 8]
# 
#   val  column freq ranking  A  B  C  D
# 1   a A, B, D    3       1  1  1 NA  3
# 2   b    A, C    2       2  2 NA  1 NA
# 3   c A, B, D    3       1  3  2 NA  1
# 4   d    A, B    2       2  4  3 NA NA
# 5   e    A, D    2       2  5 NA NA  2
# 6   f       A    1       3  6 NA NA NA
# 7   x       C    1       3 NA NA  2 NA
# 8   y       C    1       3 NA NA  3 NA
# 9   z       D    1       3 NA NA NA  4  
Community
  • 1
  • 1
A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485