Odd results of function speed tests

Question

I wrote a function which uses recursion to find greatest common factors (denominators):

> gcd
function(a,b){
 if (length(a)*length(b)>1) {
        warning("Only scalars allowed; using first elements.")
        a<-a[1]
        b<-b[1]
        }
 ifelse (b==0, a, gcd(b, a %% b))
}

I then ran some microbenchmark tests against pracma::gcd, which uses a repeated divide-and-subtract routine. Here are a couple corner-case results:

Rgames>microbenchmark(pracma::gcd(2^2*3*5^2*7*11^8*23^5*41^2,2^1*3^8*5^2*7*11^3*23^5*47^2),gcd(2^2*3*5^2*7*11^8*23^5*41^2,2^1*3^8*5^2*7*11^3*23^5*47^2),times=10)
Unit: microseconds
                                                                                               expr
 pracma::gcd(2^2 * 3 * 5^2 * 7 * 11^8 * 23^5 * 41^2, 2^1 * 3^8 *      5^2 * 7 * 11^3 * 23^5 * 47^2)
         gcd(2^2 * 3 * 5^2 * 7 * 11^8 * 23^5 * 41^2, 2^1 * 3^8 * 5^2 *      7 * 11^3 * 23^5 * 47^2)
     min      lq   median      uq     max neval
 140.260 142.399 158.0070 184.733 285.225    10
 237.759 241.607 250.3725 255.291 282.231    10
Rgames> microbenchmark(pracma::gcd(2^16,2^16*3),gcd(2^16,2^16*3),times=10)
Unit: microseconds
                        expr    min     lq  median     uq     max neval
 pracma::gcd(2^16, 2^16 * 3) 58.585 59.867 61.5780 67.992 140.688    10
         gcd(2^16, 2^16 * 3) 25.658 26.941 28.8655 30.790  41.052    10

I'm not completely certain, but it appears that the pracma code is faster when there are large primes involved and my toy code is faster when not. The powers of the primes don't seem to be the main driver. Any thoughts as to what's controlling the relative timings here? Here's the pracma code:

function (a, b, extended = FALSE) 
{
    stopifnot(is.numeric(a), is.numeric(b))
    if (length(a) == 1) {
        a <- rep(a, times = length(b))
    }
    else if (length(b) == 1) {
        b <- rep(b, times = length(a))
    }
    n <- length(a)
    e <- d <- g <- numeric(n)
    for (k in 1:n) {
        u <- c(1, 0, abs(a[k]))
        v <- c(0, 1, abs(b[k]))
        while (v[3] != 0) {
            q <- floor(u[3]/v[3])
            t <- u - v * q
            u <- v
            v <- t
        }
        e[k] <- u[1] * sign(a[k])
        d[k] <- u[2] * sign(a[k])
        g[k] <- u[3]
    }
    if (extended) {
        return(list(g = g, c = e, d = d))
    }
    else {
        return(g)
    }
}

Edit: bonus question: if we can identify the type of data which cause the difference, can we predict in advance which function will solve gcd(x,y) without analyzing x and y 's factorability?

`pracma::gcd` is likely byte-compiled. Did you byte-compile your `gcd` function before running the timings? Also, `if(b==0) a else gcd()` will be faster than `ifelse` in this case. — Joshua Ulrich, Feb 04 '14 at 20:45
There's no reason to restrict the vectors to length 1. This is easily vectorized by computing `a%%b` (performing the recycling) prior to the `ifelse`. See here: http://stackoverflow.com/a/21504113/1290634. Still probably won't beat `pracma::gcd` but one less `if` can't hurt. — Matthew Lundberg, Feb 04 '14 at 20:49
@MatthewLundberg true, but as I said it was a Q&D thing I wrote, and it was long ago before I learned good `R` practice. @Joshua good point, and I'll retry w/ a compiled version. — Carl Witthoft, Feb 04 '14 at 20:55

score 0 · Accepted Answer · answered Feb 05 '14 at 12:31

I changed the ifelse to a simple if(b==0) a else gcd(...) and repeated the runs. The function cgcd is the byte-compiled version of the modified gcd function; oldgcd is the original as posted above.

Rgames> microbenchmark(pracma::gcd(2^2*3*5^2*7*11^8*23^5*41^2,2^1*3^8*5^2*7*11^3*23^5*47^2),gcd(2^2*3*5^2*7*11^8*23^5*41^2,2^1*3^8*5^2*7*11^3*23^5*47^2),cgcd(2^2*3*5^2*7*11^8*23^5*41^2,2^1*3^8*5^2*7*11^3*23^5*47^2),oldgcd(2^2*3*5^2*7*11^8*23^5*41^2,2^1*3^8*5^2*7*11^3*23^5*47^2),times=10)
Unit: microseconds
                                                                                               expr
 pracma::gcd(2^2 * 3 * 5^2 * 7 * 11^8 * 23^5 * 41^2, 2^1 * 3^8 * 5^2 * 7 * 11^3 * 23^5 * 47^2)
         gcd(2^2 * 3 * 5^2 * 7 * 11^8 * 23^5 * 41^2, 2^1 * 3^8 * 5^2 * 7 * 11^3 * 23^5 * 47^2)
        cgcd(2^2 * 3 * 5^2 * 7 * 11^8 * 23^5 * 41^2, 2^1 * 3^8 * 5^2 * 7 * 11^3 * 23^5 * 47^2)
      oldgcd(2^2 * 3 * 5^2 * 7 * 11^8 * 23^5 * 41^2, 2^1 * 3^8 * 5^2 * 7 * 11^3 * 23^5 * 47^2)
     min      lq  median      uq     max neval
 140.261 142.826 144.109 148.386 218.516    10
  68.420  73.124  73.979  74.834  99.209    10
  68.420  70.986  72.696  74.835  78.683    10
  81.677  82.532  84.028  87.663  94.078    10

So eliminating the ifelse does improve things. I'm at a loss as to why today the pracma version runs at the same speed as yesterday but the older gcd runs faster. Either way, it looks like my version is faster. However, and this is important, for really large numbers with a ton of prime factors, the %% function throws a warning and returns a bad value, so the pracma version is more robust.

Unfortunately, if I want to vectorize as Matthew did in the referenced SO question, `ifelse` must be used :-( — Carl Witthoft, Feb 05 '14 at 13:26

Odd results of function speed tests

1 Answers1

Linked