How to find rows with most values filled in a matrix?

Question

Given a matrix (mat1) like this:

mat1 <- matrix(c(1, "", 2, 3, 4, "", 2, 4, "", 5, 2, 1, 4, "", 3, 2, "", 3, "", ""), nrow = 4, ncol = 5)

How would I go about finding say the top 3 rows with the most non-empty string values? For example in mat1, row 1 has 3 values, row 2 has 2 values, row 3 has 4 values, and row 4 has 4 values.

Is there a way where I can perhaps tabulate this in a frequency table of some sort or at least return a vector of the top rows?

In my dataset, each row is named, so preferably if possible the output would give me the top X row names. Index of rows also be alright, but it would be nice if I could also visualize what's actually going on and assess if the data makes sense — John Abercrombie, Jun 24 '20 at 20:35

score 1 · Answer 1 · answered Jun 24 '20 at 20:26

You can do something like this:

# Your matrix
mat1 <- matrix(c(1, "", 2, 3, 4, "", 2, 4, "", 5, 2, 1, 4, "", 3, 2, "", 3, "", ""), nrow = 4, ncol = 5)

# Transforming to data frame
df_mat <- as.data.frame(mat1)

# Quantity of null values
for (i in 1:nrow(df_mat)) {
  df_mat$COUNT[i] <- sum(df_mat[i,] == "")  
}

# Ordering the data frame
df_mat <- arrange(df_mat,desc(COUNT))

score 0 · Accepted Answer · answered Jun 24 '20 at 20:10

if we create a function, we can convert to 'long' format, subset out the blank elements, and get the frequency of the dim attribute for row names

f1 <- function(mat, n) {
   row.names(mat) <- seq_len(nrow(mat))
   head(sort(table(subset(as.data.frame.table(mat),
        Freq != "")$Var1), decreasing = TRUE), n)
 }

f1(mat1, 3)
#  3 4 1 
#  4 4 3

The output showed is a named vector with names representing the row index or row names and the values as the frequency of non-blanks. The n argument specified by the user gives the top n non-blank rows

How to find rows with most values filled in a matrix?

2 Answers2