Objective: Finding the lowest n
values of each row from a matrix or data frame. For this example we want to find the 3 lowest values of each row. We want to return a matrix with
rowname | colname_min | value_min | colname_min2 | value_min2 | colname_min3 | value_min3
Point of departure: I modified the answer from this question: R getting the minimum value for each row in a matrix, and returning the row and column name
Here is my modified code:
df<-data.frame(matrix(data=round(x=rnorm(100,10,1),digits=3),nrow=10),
row.names=c("A","B","C","D","E","F","G","H","I","J"))
colnames(df)<-c("AD","BD","CD","DD","ED","FD","GD","HD","ID","JD")
result <- t(sapply(seq(nrow(df)), function(i) {
j <- apply(df, 1, function(x){order(x, decreasing=F)[1:3]})
c(rownames(df)[i], colnames(df)[j[1,i]], as.numeric(df[i,j[1,i]]),
colnames(df)[j[2,i]], as.numeric(df[i,j[2,i]]),
colnames(df)[j[3,i]], as.numeric(df[i,j[3,i]]))
}))
This is working, and it is working fine for the small example data.frame. However, the data.frame I am working with has 200,000 rows and 300 columns. On my machine the code now runs for ~1 hour and is still working. Any ideas how to optimize the code? I was thinking dplyr
, but couldn't find a solution. Help is greatly appreciated.