0

This is a variant of the question presented for matrix before.

I need to find row-wise the first, second, ... biggest values of a dataframe and store each one in a separated new column.

The function I need to build should see as this:

> set.seed(1)
> v1 <- runif(10,1,10)
> v2 <- runif(10,1,10)
> v3 <- runif(10,1,10)
> Dt <- datal.frame( v1, v2, v3 )
> head(Dt, 3)
     v1    v2    v3
1 3.390 2.854 9.412
2 4.349 2.589 2.909
3 6.155 7.183 6.865
> label <- big(Dt, pos=1)
#### # big a function to find the first, second, .... (pos) biggets value and returns its label
> label
[1] "v3" "v1" "v2" ...
> big(Dt, pos=2)
[1] "v1" "v3" "v3" ...

Thanks. Juan

Community
  • 1
  • 1
jlopez
  • 335
  • 2
  • 13
  • Think this needs a bit more explanation. How long is the result here? Is it length-10, same as the rows of the matrix (you've made a matrix here, not a data frame, so that needs tweaking too). Can you make an example with actual numbers (use set.seed(310366) if you want reproducible random numbers) and show us what the answer should be. Oh, and by 'first, second' you mean 'smallest, next-smallest'? – Spacedman Oct 14 '11 at 08:20
  • this question is not clearly formulated. Your title speaks something different from the second paragraph. Please make it consistent and formulate clearly what you want (row-wise or overall maximum ...) – Tomas Oct 14 '11 at 09:14
  • @Spacedman, you are right. My text is confusing. I need the label of the variable. Sorry, I should use data.frame function insted of cbind. I am looking to the list of the first 1 or 2 biggest numbers. – jlopez Oct 14 '11 at 13:46
  • @Thomas I have serious problems with my English. But also with R. Maybe my last sentence should be erased. Don't you think? My problem is row-wise, not overall. The result should be possible to store as another variable in the same dataframe, if needed. – jlopez Oct 14 '11 at 13:49

1 Answers1

2

As @Spacedman mentioned, you should give more details. So regardless of whether this answer is helpful, try and restructure your question.

I guessing that you have a data.frame/matrix and for each row you want to extract the nth largest value.

##Set up some dummy data
R> set.seed(1)
R> v1 <- runif(10,1,10); v2 <- runif(10,1,10)
R> v3 <- runif(10,1,10); Dt <- data.frame( v1, v2, v3 )
R> head(Dt, 2)
     v1    v2    v3
1 3.390 2.854 9.412
2 4.349 2.589 2.909

##Step 1: Use "apply" and "order" to order rows
##Step 2: Use subsetting to extract a particular value
R> big = function(Dt, pos=1) {
+    ordered_rows <- apply(Dt, 1, order, decreasing = TRUE)
+    positions <- rep(colnames(Dt), nrow(Dt))[as.vector(ordered_rows[pos,])]
+    return(positions)
+  }
R> big(Dt, 3)
 [1] "v2" "v2" "v1" "v3" "v1" "v3" "v3" "v3" "v2" "v1"
R> big(Dt, 1)
 [1] "v3" "v1" "v2" "v1" "v2" "v1" "v1" "v2" "v3" "v2" 
Ben
  • 41,615
  • 18
  • 132
  • 227
csgillespie
  • 59,189
  • 14
  • 150
  • 185
  • You got my idea! That is more or less the code I need to produce. Only, I need the biggest from the sorted positions. In your code, the first results should be [1] "v3" "v1" .... If i run big(Dt,1). But your function is nice. It should be called small, I think. If I run big(Dt,2), then the result should be: [1] "v1" "v3" Which correspond to the second biggest position in the dataset. Thank you. I will explore options to sort with inverse order. – jlopez Oct 14 '11 at 13:53
  • With the modification I did, there are stil an error. Only the first label is wrong, the rest are good. The same error is for the second biggest position. – jlopez Oct 14 '11 at 14:56
  • Your previous code was almost correct, only the index was pointing wrong. And also this code is correct. Only it is longer. Your previous code should be like this to solve my problem: ' big = function(Dt, pos=1) { ordered_rows <- apply(Dt, 1, order, decreasing = TRUE) positions <- rep(colnames(Dt), nrow(Dt))[as.vector(ordered_rows[pos,])] return(positions) } " Now I have two solutions. I apreciate your help. Thank you very much! – jlopez Oct 14 '11 at 15:30
  • You second code is 70% slower than the first one when we use 10000 records. I tested this way: 'set.seed(1) v1 <- runif(10^4,1,10); v2 <- runif(10^4,1,10) v3 <- runif(10^4,1,10); Dt <- data.frame( v1, v2, v3 ) head(Dt, 5) ptm0 <- proc.time() system.time( Dt$prim <- big(Dt[,1:3], 1) ) ptm1 <- proc.time() (seg1 <- as.numeric( total <- ptm1 - ptm0 )[1] ) ptm0 <- proc.time() # str(t) t <- total[1] as.numeric(t) system.time( Dt$prim <- big1(Dt[,1:3], pos=1) ) ptm1 <- proc.time() (seg2 <- as.numeric( total <- ptm1 - ptm0 )[1] ) 100 *( seg2 - seg1 ) / seg2 'resulted 69.38. Regards – jlopez Oct 15 '11 at 04:20