-1

Given a matrix (mat1) like this:

mat1 <- matrix(c(1, "", 2, 3, 4, "", 2, 4, "", 5, 2, 1, 4, "", 3, 2, "", 3, "", ""), nrow = 4, ncol = 5)

How would I go about finding say the top 3 rows with the most non-empty string values? For example in mat1, row 1 has 3 values, row 2 has 2 values, row 3 has 4 values, and row 4 has 4 values.

Is there a way where I can perhaps tabulate this in a frequency table of some sort or at least return a vector of the top rows?

2 Answers2

1

You can do something like this:

# Your matrix
mat1 <- matrix(c(1, "", 2, 3, 4, "", 2, 4, "", 5, 2, 1, 4, "", 3, 2, "", 3, "", ""), nrow = 4, ncol = 5)

# Transforming to data frame
df_mat <- as.data.frame(mat1)

# Quantity of null values
for (i in 1:nrow(df_mat)) {
  df_mat$COUNT[i] <- sum(df_mat[i,] == "")  
}

# Ordering the data frame
df_mat <- arrange(df_mat,desc(COUNT))
Igor Farah
  • 11
  • 3
0

if we create a function, we can convert to 'long' format, subset out the blank elements, and get the frequency of the dim attribute for row names

f1 <- function(mat, n) {
   row.names(mat) <- seq_len(nrow(mat))
   head(sort(table(subset(as.data.frame.table(mat),
        Freq != "")$Var1), decreasing = TRUE), n)
 }

f1(mat1, 3)
#  3 4 1 
#  4 4 3 

The output showed is a named vector with names representing the row index or row names and the values as the frequency of non-blanks. The n argument specified by the user gives the top n non-blank rows

akrun
  • 874,273
  • 37
  • 540
  • 662