R Algorithm for generating all possible factorizations of a number

Question

For example, consider the number 96. It can be written in following ways:

1. 96
2. 48 * 2
3. 24 * 2 * 2
4. 12 * 2 * 2 * 2
5. 6 * 2 * 2 * 2 * 2
6. 3 * 2 * 2 * 2 * 2 * 2
7. 4 * 3 * 2 * 2 * 2
8. 8 * 3 * 2 * 2
9. 6 * 4 * 2 * 2
10. 16 * 3 * 2
11. 4 * 4 * 3 * 2
12. 12 * 4 * 2
13. 8 * 6 * 2
14. 32 * 3
15. 8 * 4 * 3
16. 24 * 4
17. 6 * 4 * 4
18. 16 * 6
19. 12 * 8

I know this is related to partitions as any number written as the power, n, of a single prime, p, is simply the number of ways you can write n. For example, to find all of the factorizations of 2^5, we must find all ways to write 5. They are:

1+1+1+1+1 ==>> 2^1 * 2^1 * 2^1 * 2^1 * 2^1
1+1+1+2 ==>> 2^1 * 2^1 * 2^1 * 2^2
1+1+3 ==>> 2^1 * 2^1 * 2^3
1+2+2 ==>> 2^1 * 2^2 * 2^2
1+4 ==>> 2^1 * 2^4
2+3 ==>> 2^2 * 2^3
5 ==>> 2^5

I found a wonderful article by Jerome Kelleher about partition generating algorithms here. I have adapted one of his python algorithms to R. The code is below:

library(partitions)   ## using P(n) to determine number of partitions of an integer
IntegerPartitions <- function(n) {
    a <- 0L:n
    k <- 2L
    a[2L] <- n
    MyParts <- vector("list", length=P(n))
    count <- 0L
    while (!(k==1L)) {
        x <- a[k-1L]+1L
        y <- a[k]-1L
        k <- k-1L
        while (x<=y) {a[k] <- x; y <- y-x; k <- k+1L}
        a[k] <- x+y
        count <- count+1L
        MyParts[[count]] <- a[1L:k]
    }
    MyParts
}

I attempted to extend this method to numbers with more one than one prime factor, but my code got very clunky. After wrestling with this idea for a while, I decided to try a different route. My new algorithm makes no use of generating partitions whatsoever. It is more of a "lookback" algorithm that takes advantage of factorizations that have already been generated. The code is below:

FactorRepresentations <- function(n) {

MyFacts <- EfficientFactorList(n)
MyReps <- lapply(1:n, function(x) x)

    for (k in 4:n) {
        if (isprime(k)) {next}
        myset <- MyFacts[[k]]
        mylist <- vector("list")
        mylist[[1]] <- k
        count <- 1L
            for (j in 2:ceiling(length(myset)/2)) {
                count <- count+1L
                temp <- as.integer(k/myset[j])
                myvec <- sort(c(myset[j], temp), decreasing=TRUE)
                mylist[[count]] <- myvec
                MyTempRep <- MyReps[[temp]]

                if (isprime(temp) || temp==k) {next}

                if (length(MyTempRep)>1) {
                    for (i in 1:length(MyTempRep)) {
                        count <- count+1L
                        myvec <- sort(c(myset[j], MyTempRep[[i]]), decreasing=TRUE)
                        mylist[[count]] <- myvec
                    }
                }
            }
        MyReps[[k]] <- unique(mylist)
    }
    MyReps
}

The first function in the code above is simply a function that generates all factors. Here is the code if you are curious:

EfficientFactorList <- function(n) {
    MyFactsList <- lapply(1:n, function(x) 1)
    for (j in 2:n) {
        for (r in seq.int(j, n, j)) {MyFactsList[[r]] <- c(MyFactsList[[r]], j)}
    }
    MyFactsList
}

My algorithm is just okay if you are only concerned with numbers less than 10,000 (it generates all factorizations for every number <= 10,000 in about 17 seconds), but it definitely doesn't scale well. I would like to find an algorithm that has the same premise of generating a list of all factorizations for every number less than or equal to n as some of the applications I have in mind will reference a given factorization multiple times, thus having it in a list should be faster than generating it on the fly everytime (I know there is a memory cost here).

This isn't a simple problem (obviously) but in case you haven't found it yet, here's the relevant entry from the On-line Encyclopedia of Integer Sequences: https://oeis.org/A001055 — Josh O'Brien, Apr 27 '15 at 21:14
This is very helpful, although this only gives the total number of factorizations and not the actual factorizations themselves. For example, for n = 96 as above, it gives 19. — Joseph Wood, Apr 27 '15 at 21:18

josliber · Accepted Answer · 2015-04-27T23:17:14.600

Your function EfficientFactorList does a good job of efficiently grabbing the set of all factors for each number from 1 to n, so all that remains is getting the set of all factorizations. As you suggest, using the factorizations of smaller values to compute the factorizations for larger values seems like it could be efficient.

Consider a number k, with factors k_1, k_2, ..., k_n. A naive approach would be to combine the factorizations of k/k_1, k/k_2, ..., k/k_n, appending k_i to each factorization of k/k_i to yield a factorization of k. As a worked example, consider computing the factorizations of 16 (which has non-trivial factors 2, 4, and 8). 2 has factorization {2}, 4 has factorizations {4, 2*2}, and 8 has factorizations {8, 4*2, 2*2*2}, so we would compute the full set of factorizations by first computing {2*8, 4*4, 2*2*4, 8*2, 4*2*2, 2*2*2*2} and then taking the unique factorizations, {8*2, 4*4, 4*2*2, 2*2*2*2}. Adding 16 yields the final answer.

A more efficient approach is to notice that we don't need to append k_i to all the factorizations of k/k_i. For instance, we didn't need to add 2*2*4 from the factorization of 4 because this is already included from the factorization of 8. Similarly, we didn't need to add 2*8 from the factorization of 2 because this is already included from the factorization of 8. In general, we only need to include a factorization from k/k_i if all the values in the factorization are k_i or greater.

In code:

library(gmp)
all.fact <- function(n) {
  facts <- EfficientFactorList(n)
  facts[[1]] <- list(1)
  for (x in 2:n) {
    if (length(facts[[x]]) == 2) {
      facts[[x]] <- list(x)  # Prime number
    } else {
      x.facts <- facts[[x]][facts[[x]] != 1 & facts[[x]] <= (x^0.5+0.001)]
      allSmaller <- lapply(x.facts, function(pf) lapply(facts[[x/pf]], function(y) {
        if (all(y >= pf)) {
            return(c(pf, y))
        } else {
            return(NULL)
        }
      }))
      allSmaller <- do.call(c, allSmaller)
      facts[[x]] <- c(x, allSmaller[!sapply(allSmaller, function(y) is.null(y))])
    }
  }
  return(facts)
}

This is a good deal quicker than the posted code:

system.time(f1 <- FactorRepresentations(10000))
#    user  system elapsed 
#  13.470   0.159  13.765 
system.time(f2 <- all.fact(10000))
#    user  system elapsed 
#   1.602   0.028   1.641

As a sanity check, it also returns the same number of factorizations for each number:

lf1 <- sapply(f1, length)
lf2 <- sapply(f2, length)
all.equal(lf1, lf2)
# [1] TRUE

Really nice R implementation! One minor observation: in the do.call function near the bottom, the "c" should be a string i.e. do.call("c", allSmaller) — Joseph Wood, Apr 27 '15 at 23:57
This code scales way better than mine as well. all.fact(20000) only takes around 3 seconds, whereas mine took almost 50 seconds. Awesome!! — Joseph Wood, Apr 28 '15 at 00:01
@JosephWood re your first comment, is there a reason you suggest that? `do.call(c, list(1, 2, 3))` returns the same thing as `do.call("c", list(1, 2, 3))` and saves two keystrokes. I see from `?do.call` that they use both, but it appears to only matter when you're specifying environments (which we're not doing here). — josliber, Apr 28 '15 at 00:06
when I ran your code I obtained this error: "Error in do.call(c, allSmaller) : 'what' must be a character string or a function." The error went away when I put quotations around it. As an R newbie, I didn't immediately think about the environment. You are completely right. Sorry about that. — Joseph Wood, Apr 28 '15 at 00:31
Superb. Thanks for including the worked example in your explanation! — Josh O'Brien, Apr 28 '15 at 00:43

score 0 · Answer 2 · edited May 23 '17 at 11:53

In case anybody is interested in generating the multiplicative partitions for one number n, below are two algorithms that will do just that (the function IntegerPartition comes from the question above):

library(gmp)
library(partitions)
get_Factorizations1 <- function(MyN) {
    pfs <- function (x1) {
        n1 <- length(x1)
        y1 <- x1[-1L] != x1[-n1]
        i <- c(which(y1), n1)
        list(lengths = diff(c(0L, i)), values = x1[i], uni = sum(y1)+1L)
    }

    if (MyN==1L) return(MyN)
    else {
        pfacs <- pfs(as.integer(factorize(MyN)))
        unip <- pfacs$values
        pv <- pfacs$lengths
        n <- pfacs$uni
        mySort <- order(pv, decreasing = TRUE)
        pv <- pv[mySort]
        unip <- unip[mySort]
        myReps <- lapply(IntegerPartitions(pv[1L]), function(y) unip[1L]^y)
        if (n > 1L) {
            mySet <- unlist(lapply(2L:n, function(x) rep(unip[x],pv[x])))
            for (p in mySet) {
                myReps <- unique(do.call(c,
                    lapply(myReps, function(j) { 
                        dupJ <- duplicated(j)
                        nDupJ <- !dupJ
                        SetJ <- j[which(nDupJ)]
                        lenJ <- sum(nDupJ)
                        if (any(dupJ)) {v1 <- j[which(dupJ)]} else {v1 <- vector(mode="integer")}
                        tList <- vector("list", length=lenJ+1L)
                        tList[[1L]] <- sort(c(j,p))

                        if (lenJ > 1L) {c2 <- 1L
                            for (a in 1:lenJ) {tList[[c2 <- c2+1L]] <- sort(c(v1,SetJ[-a],SetJ[a]*p))}
                        } else {
                            tList[[2L]] <- sort(c(v1,p*SetJ))
                        }
                        tList
                    }
                )))
            }
        }
    }
    myReps
}

Below is josliber's code from above manipulated to handle a single case. The function MyFactors comes from this post (it returns all of the factors of a given number).

library(gmp)
get_Factorizations2 <- function(n) {
    myFacts <- as.integer(MyFactors(n))
    facts <- lapply(myFacts, function(x) 1L)
    numFacs <- length(myFacts)
    facts[[numFacs]] <- myFacts
    names(facts) <- facts[[numFacs]]
    for (j in 2L:numFacs) {
        x <- myFacts[j]
        if (isprime(x)>0L) {
            facts[[j]] <- list(x)
        } else {
            facts[[j]] <- myFacts[which(x%%myFacts[myFacts <= x]==0L)]
            x.facts <- facts[[j]][facts[[j]] != 1 & facts[[j]] <= (x^0.5+0.001)]
            allSmaller <- lapply(x.facts, function(pf) lapply(facts[[which(names(facts)==(x/pf))]], function(y) {
                if (all(y >= pf)) {
                    return(c(pf, y))
                } else {
                    return(NULL)
                }
            }))
            allSmaller <- do.call(c, allSmaller)
            facts[[j]] <- c(x, allSmaller[!sapply(allSmaller, function(y) is.null(y))])
        }
    }
    facts[[numFacs]]
}

Here are some benchmarks:

set.seed(101)
samp <- sample(10^7, 10^4)
library(rbenchmark)
benchmark(getFacs1=sapply(samp, get_Factorizations),
            getFacs2=sapply(samp, get_Factorizations2),
            replications=5,
            columns = c("test", "replications", "elapsed", "relative"),
            order = "relative")
test replications elapsed relative
1 getFacs1            5  117.68    1.000
2 getFacs2            5  216.39    1.839


system.time(t2 <- get_Factorizations(25401600))
 user  system elapsed 
10.89    0.03   10.97
system.time(t2 <- get_Factorizations2(25401600))
 user  system elapsed 
21.08    0.00   21.12 

length(t1)==length(t2)
[1] TRUE

object.size(t1)
28552768 bytes
object.size(t2)
20908768 bytes

Even though get_Factorizations1 is faster, the second method is more intuitive (see josliber's excellent explanation above) and it produces a smaller object. For the interested reader, here is a really good paper about the subject.

R Algorithm for generating all possible factorizations of a number

2 Answers2