use multiple columns as variables with sapply

Question

I have a dataframe and I would like to apply a function that takes the values of three columns and computes the minimum difference between the three values.

#dataset
df <- data.frame(a= sample(1:100, 10),b = sample(1:100, 10),c= sample(1:100, 10))

#function
minimum_distance <- function(a,b,c)
{
  dist1 <- abs(a-b)
  dist2 <- abs(a-c)
  dist3 <- abs(b-c)
  return(min(dist1,dist2,dist3))
}

I am looking for something like:

df$distance <- sapply(df, function(x) minimum_distance(x$a,x$b,x$c) )
## errormessage
Error in x$a : $ operator is invalid for atomic vectors

While I can use ddply:

df2 <- ddply(df,.(a),function(r) {data.frame(min_distance=minimum_distance(r$a,r$b, r$c))}, .drop=FALSE)

This doesn't keep all of the columns. Any suggestions?

Edit: I ended up using:

df$distance <- mapply(minimum_distance, df$a, df$b, df$c)

score 60 · Accepted Answer · answered Apr 09 '12 at 19:02

60

Try mapply():

qq <- mapply(minimum_distance, df$a, df$b, df$c)

answered Apr 09 '12 at 19:02

geoffjentry

4,674
3
31
37

Which ine is the fastest? or more efficient? – Bharath Mar 08 '16 at 18:23

score 6 · Answer 2 · answered Apr 09 '12 at 19:06

6

try this:

do.call("mapply", c(list(minimum_distance), df))

but you can write vectorized version:

pminimum_distance <- function(a,b,c)
{
 dist1 <- abs(a-b)
 dist2 <- abs(a-c)
 dist3 <- abs(b-c)
 return(pmin(dist1,dist2,dist3))
}
pminimum_distance(df$a, df$b, df$c)

# or
do.call("pminimum_distance", df)

answered Apr 09 '12 at 19:06

kohske

65,572
8
165
155

this is smart but a little less straightforward thank mapply. – zach Apr 09 '12 at 19:15

Tyler Rinker · Answer 3 · 2012-04-09T22:07:50.697

6

I know this has been answered but I'd actually take a different approach that takes any number of columns and is more generalizable using an outer approach:

vdiff <- function(x){
    y <- outer(x, x, "-")
    min(abs(y[lower.tri(y)]))
}

apply(df, 1, vdiff)

I think this is a little cleaner and flexible.

EDIT: Per zach's comments I propose this more formalized function that works on data frames with non numeric columns as well by removing them and acting only on the numeric columns.

cdif <- function(dataframe){
    df <- dataframe[, sapply(dataframe, is.numeric)]
    vdiff <- function(x){
        y <- outer(x, x, "-")
        min(abs(y[lower.tri(y)]))
    }
    return(apply(df, 1, vdiff))
}

#TEST it out
set.seed(10)
(df <- data.frame(a = sample(1:100, 10), b = sample(1:100, 10), 
    c = sample(1:100, 10), d =  LETTERS[1:10]))

cdif(df)

edited Apr 09 '12 at 22:07

answered Apr 09 '12 at 21:37

Tyler Rinker

108,132
65
322
519

nice idea. my real dataframe is not a matrix however - could this be modified for use in a dataframe with text columns? something like outer(x,x,"-", drop_string=T)? – zach Apr 09 '12 at 21:55
The function `outer` doesn't necessarily mean you're working on a matrix. It just takes two vectors and a function and makes a matrix of all possible combinations for those two vectors. Here I just supply the same vector (the row) to outer twice and the function subtraction `-` operator. I added a bit to my solution to make a self contained function that acts on data frames and excludes anything that's not numeric. `outer` can be very powerful I just wished I could remember to use it more. As far as the drop_string = T? No such luck but `sapply` with an `is.numeric` query works well. – Tyler Rinker Apr 09 '12 at 22:13
very nice. I agree that outer is quite powerful and that for a larger matrix this would be the way to go rather than specifying each column or value. – zach Apr 09 '12 at 22:31
Note: That because this answer is more generalizable it is likely that it also slower, not sure how much of an issue speed is (ie how big your data set is). – Tyler Rinker Apr 09 '12 at 22:31
in this case speed is not a problem but I will keep this in mind. thanks tyler. – zach Apr 09 '12 at 22:34

score 2 · Answer 4 · answered Jul 28 '16 at 04:38

2

Its better to write a function and then use mapply on the vectors:

 f1 <- function(a,b,c){
 d =abs(a-b)
 e =abs(b-c)
 f= abs(c-a)
 return(pmin(d,e,f))
 }

 qq <- mapply(f1, df$a, df$b, df$c)

answered Jul 28 '16 at 04:38

Shalini Baranwal

2,780
4
24
34

use multiple columns as variables with sapply

4 Answers4

Linked

Related