1

I have a method that converts value to a newBase number of length length.

The logic in english is:

If we calculated every possible combination of numbers from 0 to (c-1)
with a length of x
what set would occur at point i

While the method below does work perfectly, because very large numbers are used, it can take a long time to complete:

For example, value=(((65536^480000)-1)/2), newbase=(65536), length=(480000) takes about an hour to complete on a 64 bit architecture, quad core PC).

private int[] GetValues(BigInteger value, int newBase, int length)
{
    Stack<int> result = new Stack<int>();

    while (value > 0)
    {
        result.Push((int)(value % newBase));

        if (value < newBase)
            value = 0;
        else
            value = value / newBase;
    }

    for (var i = result.Count; i < length; i++)
    {
        result.Push(0);
    }

    return result.ToArray();
}

My question is, how can I change this method into something that will allow multiple threads to work out part of the number?

I am working C#, but if you're not familiar with that then pseudocode is fine too.

Note: The method is from this question: Cartesian product subset returning set of mostly 0

Transmission
  • 1,219
  • 1
  • 10
  • 11
  • "My question is, how can I change this method into something that will allow multiple threads to work out part of the number?" --- that's a wrong question. You would better parallelize the original array of numbers processing. So that this exact method was run in parallel for different numbers. As soon as this is a [pure function](http://en.wikipedia.org/wiki/Pure_function) - it should be trivial. – zerkms Mar 24 '14 at 22:19
  • Whilst I agree with you, I'm unsure how to reword my question to reflect this. Would you mind editing it? – Transmission Mar 24 '14 at 22:21
  • "it can take a long time to complete." --- you could start from clarifying this. Programming with threads brings some overhead and makes code more complex. Let's start from making sure it's required to do. – zerkms Mar 24 '14 at 22:23
  • @zerkms: I've clarified 'a long time' with an example – Transmission Mar 24 '14 at 22:47
  • @HenkHolterman: That's a good idea, I thought a Stack would be faster, I'll see if I can change it to an int array. – Transmission Mar 24 '14 at 22:48

1 Answers1

2

If that GetValues method is really the bottleneck, there are several things you can do to speed it up.

First, you're dividing by newBase every time through the loop. Since newBase is an int, and the BigInteger divide method divides by a BigInteger, you're potentially incurring the cost of an int-to-BigInteger conversion on every iteration. You might consider:

BigInteger bigNewBase = newBase;

Also, you can cut the number of divides in half by calling DivRem:

while (value > 0)
{
    BigInteger rem;
    value = BigInteger.DivRem(value, bigNewBase, out rem);
    result.Push((int)rem);
}

One other optimization, as somebody mentioned in comments, would be to store the digits in a pre-allocated array. You'll have to call Array.Reverse to get them in the proper order, but that takes approximately no time.

That method, by the way, doesn't lend itself to parallelizing because computing each digit depends on the computation of the previous digit.

Jim Mischel
  • 131,090
  • 20
  • 188
  • 351
  • I tried some of your suggestions, and you were right about moving to an array of int, and using DivRem, these together sped up my routine on large tests by about 30%. However, using the newBase as a BigInteger actually made the process very slightly slower, consistently adding about a second to the runtime (where it was previously about 7, so not much). As a follow-up question, do you think that it would be possible to convert the approach taken [here](http://stackoverflow.com/a/22269999/2453421) to work with numbers instead of a char[], as it looks like it could be very easily parallelised. – Transmission Mar 26 '14 at 12:37
  • @Transmission: I'm surprised that converting `newBase` to a `BigInteger` would slow this down. Are you running a release build with the debugger *detached* (i.e. Ctrl+F5 if you're in Visual Studio)? Also, how are you doing your timings? – Jim Mischel Mar 26 '14 at 13:55
  • @Transmission: I don't see how that answer you linked applies to this problem. That answer is just creating combinations from a set--something that you can parallelize (although it might not be worthwhile). There, the individual values selected are independent of each other. In your base conversion problem, each result depends on the previous result. That is, you can't compute the second digit until after you've computed the first, and the third depends on the second, etc. – Jim Mischel Mar 26 '14 at 14:01
  • I was running a debug build, with the debugger attached, as this was one of my Unit Tests. I was using Stopwatch objects for timings, which is my usual method. I'd use the profiler, but it doesn't seem to want to work on my pc. As for the second point, the end game of this method is "If we calculated every possible combination of numbers from 0 to (c-1), with a length of x, what set would occur at point i", and this maps (I think) quite well to the answer I linked, which does this for character arrays instead of numbers. – Transmission Mar 26 '14 at 15:20
  • The reason I've been referring to this as a base conversion problem, is because that's how others have interpreted it, which makes sense, but like I said, the end game is to get the ith possible set of numbers. – Transmission Mar 26 '14 at 15:21
  • @Transmission: Timings for a debug build will be much different than for a release build. And timing with the debugger attached will give you *very* different results. It's not uncommon for function X to be faster than Y with the debugger attached, but significantly *slower* than Y when the debugger is not involved. If you're testing optimizations, you really need to do your timings with a release build and no debugger attached. – Jim Mischel Mar 26 '14 at 16:57
  • @Transmission: You might also be interested in Eric Lippert's series on permutations, especially part 5: http://ericlippert.com/2013/04/29/producing-permutations-part-five/ – Jim Mischel Mar 26 '14 at 16:58