1

First of all, hallo! This is my first post here and I have to admit that I'm bit nervous

I have a data.frame where each element is in triplicates and has it own value, here is an example

test <-data.frame (id = c("a", "a", "a" , "b", "b", "b"),
        val = c(1,100,300,1,2,3))  

I need to calculate the difference between the values in each replica set and remove it if the difference between the first val with the second or the second with the third is lower then a number.

I tried to create my own small function and use it with ddply but I had no success so far

here what I'm trying to do:

f<-function(x) if(x[1,2]-x[2,2] < 60 || x[2,2]- x[3,2] < 60) NULL else (x)
ddply(test, .(id), f)

What I would like to have at the end, in this example, is:

id    val
a    1
a    100
a    300

the "b" is dropped because (1-2 < 60)

Instead I get various error messages or strange data.frames that look just wrong

I hope I was clear enough.

thanks in advance

MP

EDIT: the differences are intended in absolute value

spleen
  • 119
  • 2
  • 10

1 Answers1

3

A function that you're using within ddply has to return a dataframe (or nothing, in this case, when you want that section dropped from the final results):

ddply(
  test,
  .(id),
  function(df_part, min_diff) {
    # Using abs() because I assume you want absolute differences
    diffs <- abs(diff(df_part$val))
    if (any(diffs < min_diff)) {
      return()
    } else {
      return(df_part)
    }
  },
  min_diff=60
)
Marius
  • 58,213
  • 16
  • 107
  • 105