2

I need to find the greatest common divisor (gcd) for a set of durations: dur.

My data look like this

            actrec dur
1  c Personal Care 120
2      c Free Time  10
3      c Free Time  70
4      c Free Time  40
5         b Unpaid  10
6      c Free Time  20
7  c Personal Care  30
8      c Free Time  40
9      c Free Time  40
10     c Free Time  10 

I am using the function gcd of the schoolmath library. I am looping through my data and store the values in the vector v. Finally, I use the min of v to find the gcd of my data.

library(schoolmath) 

l = length(dt$dur) 
v = array(0, l)

for(i in 2:l){
  v[i] = gcd(dt$dur[i], dt$dur[i-1]) 
}

minV = min(v[-1]) 
minV

Which gives 10.

However, I have trouble translating this routine into dplyr.

I thought of something like (lag for loop).

dt %>% mutate(gcd(dur, lag(dur, 0))) 

But it isn't working. And I am unsure how to insert min.

Any clue ?

giac
  • 4,261
  • 5
  • 30
  • 59
  • Looks like the `gcd` is not vectorized. Perhaps `dt %>% mutate(dur1 = lag(dur, default = dur[1])) %>% rowwise() %>% mutate(new1 = gcd(dur, dur1))` – akrun Aug 14 '16 at 14:09
  • 1
    Here's a vectorized version of gcd that could be helpful http://stackoverflow.com/a/21504113/3001626 – David Arenburg Aug 14 '16 at 14:30
  • thanks interesting – giac Aug 14 '16 at 14:36
  • Using this function, you could do `dt %>% mutate(res = gcd(dur, lag(dur))) %>% summarise(Min = min(res, na.rm = TRUE))` which will be probably much faster than the `schoolmath::gcd` one – David Arenburg Aug 14 '16 at 14:42

1 Answers1

2

We can use rowwise to apply the gcd function on each row after taking the lag of 'dur, extract the 'new1' and get the min

dt %>%
   mutate(dur1 = lag(dur, default = dur[1])) %>% 
   rowwise() %>% 
   mutate(new1 = gcd(dur, dur1)) %>% 
  .$new1 %>% 
   tail(.,-1) %>% 
   min
#[1] 10

Or we create a Vectorized function of 'gcd' and apply on the 'dur' column

 gcdV <- Vectorize(function(x,y) gcd(x, y))
 dt %>%
   mutate(new1 = gcdV(dur, lag(dur, default = dur[1])))

and get the min as in the above solution.

akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    thank you - great answer. I am surprised actually that the code has to be so long. – giac Aug 14 '16 at 17:12
  • 1
    @giacomoV I was extracting the `min` as a single value. If you want as a data.frame, it could be `dt %>% summarise(Min = min(gcdV(dur, lag(dur, default = dur[1]))[-1]))` – akrun Aug 14 '16 at 17:20