0

I have a data frame in 18528 rows and 3 columns like below:

Sample  Target  Value
100      A       21.5
100      A       20.5
100      B       19.5
100      B       19.75
100      B       18.15
100      B       21.95
200      A       21.1
200      A       21.6
200      B       23.5
200      B       20.75
100      C       21.25
100      C       22.0
100      C       18.33
100      C       21.84

I need to calculate difference between values in each groups:

Sample  Target  Value   dif
100      A       21.5   1
100      A       20.5   1
100      B       19.5   0.25
100      B       19.75  1.6
100      B       18.15  3.8
100      B       21.95  2.45
200      A       21.1   0.5
200      A       21.6   0.5
200      B       23.5   2.75
200      B       20.75  2.75
100      C       21.25  0.75
100      C       22.0   3.67
100      C       18.33  3.51
100      C       21.84  0.59

Then if difference is more than 2, make that value "NA" like:

Sample  Target  Value   dif
100      A       21.5   1
100      A       20.5   1
100      B       19.5   0.25
100      B       19.75  1.6
100      B       18.15  3.8
100      B       NA     2.45
200      A       21.1   0.5
200      A       21.6   0.5
200      B       NA     2.75
200      B       NA     2.75
100      C       21.25  0.75
100      C       22.0   3.67
100      C       NA     3.51
100      C       21.84  0.59

I used combn to calculate difference, but I got Error, I think the reason can be different length in groups (2 and 4). Thanks in advance

Lili
  • 75
  • 7
  • A group is a couple (Sample, Target) I guess? Actually on the output you present you withdraw line 3 Value to line 2 Value to get the dif on line 2 ... – Colonel Beauvel Apr 09 '15 at 09:25
  • exactly, and thank you for editing :) – Lili Apr 09 '15 at 09:28
  • yes but my remark was that there is a big difference between what you said about the gorup and what you presented as output (no grouping there since you take the difference of consecutive values). So I wonder what you finally want as result ... – Colonel Beauvel Apr 09 '15 at 09:30
  • what I mean; there is 5 groups(A-100,B-100,A-200,B-200,C-100) and I want to calculate the difference between values in each group. – Lili Apr 09 '15 at 09:37
  • Actually you need to calculate only the difference of the value you have and the maximum and minimum value within the group – Colonel Beauvel Apr 09 '15 at 09:49
  • But htat is clearly not what you wrote in the second table ... – Colonel Beauvel Apr 09 '15 at 09:50
  • What code did you use to go from first display to second? – Marichyasana Apr 09 '15 at 10:47

1 Answers1

0

You can get desired output using dplyr package. If you don't have it installed first run command install.packages("dplyr") or install it manually.

Then what we have:

require("dplyr")

mydf <- read.table(text = "
Sample  Target  Value
100      A       21.5
100      A       20.5
100      B       19.5
100      B       19.75
100      B       18.15
100      B       21.95
200      A       21.1
200      A       21.6
200      B       23.5
200      B       20.75
100      C       21.25
100      C       22.0
100      C       18.33
100      C       21.84", header = T)

mydf1 <- mydf %>% group_by(Sample, Target) %>% 
  mutate(ValueShifted = c(Value[-1], Value[1]) ) %>%
  mutate(dif = abs(Value - ValueShifted) ) %>%
  mutate(NewValue = c(1, NA)[(as.numeric(dif > 2)+1)] * Value )

> mydf1
Source: local data frame [14 x 6]
Groups: Sample, Target

   Sample Target Value ValueShifted  dif NewValue
1     100      A 21.50        20.50 1.00    21.50
2     100      A 20.50        21.50 1.00    20.50
3     100      B 19.50        19.75 0.25    19.50
4     100      B 19.75        18.15 1.60    19.75
5     100      B 18.15        21.95 3.80       NA
6     100      B 21.95        19.50 2.45       NA
7     200      A 21.10        21.60 0.50    21.10
8     200      A 21.60        21.10 0.50    21.60
9     200      B 23.50        20.75 2.75       NA
10    200      B 20.75        23.50 2.75       NA
11    100      C 21.25        22.00 0.75    21.25
12    100      C 22.00        18.33 3.67       NA
13    100      C 18.33        21.84 3.51       NA
14    100      C 21.84        21.25 0.59    21.84
inscaven
  • 2,514
  • 19
  • 29
  • Thank you, it works somehow, but in the last group, 18.33 shall be "NA", if it is done other differences in that group will be less than 2 and it will be the same in second group (21.95). – Lili Apr 09 '15 at 13:42
  • OK, I didn't understand the rule of computing `dif` completely. What would it look like for `x = c(1.1, 1.5, 2.5, 3)` ? I supposed just difference `x[i] - x[i+1]` for `i`-th row with looping to first value in the end. But maybe I'm wrong with computing `dif`, so correct me. Another question(s): you need to remove a minimal set of values to make all `dif`s < 2 ? Is it the same condition as `(max(Value) - min(Value)) < 2` inside a group? And what if values in group are `15 15.5 19.5 20` ? So, can you please specify more about what you need to do with the data and what are the rules. – inscaven Apr 09 '15 at 15:55
  • Thank you for your consideration, it can be said, the value shall be removed that has a difference more than 2, in 15 15.5 19.5 20; 15 and 15.5 shall be removed, because 19.5-15.5=4>2 and 20-15=5>2 after removing those values we will have 19.5 and 20 that difference is less than 2. I hope that I could explain clearly. – Lili Apr 10 '15 at 07:01