Bad rust code optimization or I just haven't done enough? (Euler #757)

Question

I'm trying to solve my first ever project Euler problem just to have fun with Rust, and got stuck on what seems to be an extremely long compute time to solve

Problem: https://projecteuler.net/problem=757

I came up with this code to try to solve it, which I'm able to solve the base problem (up to 10^6) in ~245 ms and get the expected result of 2,851.

use std::time::Instant;

fn factor(num: u64) -> Vec<u64> {
    let mut counter = 1;
    let mut factors = Vec::with_capacity(((num as f64).log(10.0)*100.0) as _);
    while counter <= (num as f64).sqrt() as _ {
        let div = num / counter;
        let rem = num % counter;
        if rem == 0 {
            factors.push(counter);
            factors.push(div);
        }
        counter += 1
    }
    factors.shrink_to_fit();
    factors
}

fn main() {
    let now = Instant::now();
    let max = 10u64.pow(6);
    let mut counter = 0;
    'a: for i in 1..max {
        // Optimization: All numbers in the pattern appear to be evenly divisible by 4
        let div4 = i / 4;
        let mod4 = i % 4;
        if mod4 != 0 {continue}
        // Optimization: And the remainder of that divided by 3 is always 0 or 1
        if div4 % 3 > 1 {continue}
        let mut factors = factor(i);
        if factors.len() >= 4 {
            // Optimization: The later found factors seem to be the most likely to fit the pattern, so try them first
            factors.reverse();
            let pairs: Vec<_> = factors.chunks(2).collect();
            for paira in pairs.iter() {
                for pairb in pairs.iter() {
                    if pairb[0] + pairb[1] == paira[0] + paira[1] + 1 {
                        counter += 1;
                        continue 'a;
                    }
                }
            }
        }
    }
    println!("{}, {} ms", counter, now.elapsed().as_millis());
}

It looks like my code is spending the most amount of time on factoring, and in my search for a more efficient factoring algorithm than what I was able to come up with on my own, I couldn't find any rust code already made (the code I did find was actually slower.) But I did a simulation to estimate how long it would take even if I had a perfect factoring algorithm, and it would take 13 days to find all numbers up to 10^14 with the non-factoring portions of this code. Probably not what the creator of this problem intends.

Given I'm relatively new to programming, is there some concept or programming method that I'm not aware of (like say using a hashmap to do fast lookups) that can be used in this situation? Or is the solution going to involve spotting patterns in the numbers and making optimizations like the ones I have found so far?

I don't think the factoring can be much improved. I think the real solution is to do some math. Number theory and all that. The conditions probably place some interesting constraint on the form the number can have. — cadolphs, Jun 22 '21 at 21:23
PS: If you're looking for fun little coding challenges that don't involve a ton of obscure math, check out adventofcode.com — cadolphs, Jun 22 '21 at 21:40
Silly question: are you running in release mode? (`cargo run --release`, `cargo build --release`, the equivalent switch in your IDE, etc.) — user4815162342, Jun 22 '21 at 21:43
So in your factor, you don't need to run to `num / 2`. Running up to `sqrt(2)` is sufficient. But there's a solution that doesn't need to factorize anything, based on manipulating the math in the problem. Happy to share the insight in an answer if you want to. — cadolphs, Jun 23 '21 at 04:34
How about enough of a hint to point us in the right direction @Lagerbaer... I've been looking at this one too. Does the solution you have in mind just involve solving for unknown vars given `N`. There must be some property I'm overlooking from number theory. — Todd, Jun 24 '21 at 07:08
Hm, a hint to get you in the right direction could be... imagine you have a number N = ab = cd, a + b = c + d + 1. Let a be the smaller one of a, b. Then form new numbers x = c - a and y = d - a. Now try to use x and y in some way to get to N. — cadolphs, Jun 24 '21 at 14:52
@Lagerbaer thank you that hint helped my time go from lifetime of the universe down to 14 seconds — NechesStich, Jun 24 '21 at 20:57
Eh I give up, I went up to 3d calculus in college and stopped there, and that was over 15 years ago. I'm forgetting so much about even basic algebra that I have no chance at this lol. I did come up with another method that didn't involve factoring but it was even slower. At least I came up with a pretty efficient means of factoring, though I suspect that somebody even better at math could come up with a better algorithm. — Dragoon, Jun 25 '21 at 03:44
Finding prime factors then using those to get the factors might be faster, in theory, than iterating over all integers and checking the modulus. But, just simply scanning every integer might be faster in practice. Anyway, regarding number theory, the thing to look at is pronic numbers. I don't think many people who've gone far in math can just whip out the formula for this one. It helps to have an idea what you're looking for. — Todd, Jun 25 '21 at 11:03
The capacity calculated for vectors in `factor()` is *way* too large. For instance, the largest number of factors for any number from 1 up to 1,000,000 is 240 for the number 720,720. I believe `factor()` allocates 8K bytes for n=1mil. — Todd, Jun 25 '21 at 22:38
Changed it to log(num)*100. Still goes over but not by much. — Dragoon, Jun 26 '21 at 02:47
You could create a `Factorer` class that has a vector as a field, and it just invokes `.clear()` on it before it starts calculating the new set of factors. Just let it determine its size - it'll end up only resizing a small number of times until it hits the sweet spot. That way you don't have any allocation overhead for the majority of numbers factored. — Todd, Jun 26 '21 at 08:08
btw, I don't think the vector allocations are a serious hit to performance. Without preallocating, you're probably only losing a nanosecond or two, if even that. — Todd, Jun 26 '21 at 08:51
Actually using a variation of the approach you mentioned earlier knocked off 20% of the time it took for 10^7. Basically I changed the signature to: fn factor(num: u64, factors: &mut Vec) -> &mut Vec then created the vector at size 400 and just called clear before running it. Still nowhere close to making this approach viable, but a decent improvement — Dragoon, Jun 26 '21 at 15:54
Nothing we can do to the current approach is going to solve the problem for 10^14. This isn't something that smart usage of hashmaps or sets can solve. This belongs to the realm of number theory, and knowing what principles apply. Might want to take a look at "pronic numbers". But otherwise, it's a good opportunity to measure the performance of different attempts at optimization. — Todd, Jun 27 '21 at 07:47

score 0 · Answer 1 · answered Jun 22 '21 at 23:15

0

If Vec::push is called when the vector is at its capacity, it will re-allocate its internal buffer to double the size and copy all its elements to this new allocation. Vec::new() creates a vector with no space allocated so it will be doing this re-allocation.

You can use Vec::with_capacity((num/2) as usize) to avoid this and just allocate the max you might need.

answered Jun 22 '21 at 23:15

pigeonhands

3,066
15
26

1

(num/2) makes it slower, I tried a fixed amount of varying sizes, only made about a 1% performance improvement though – Dragoon Jun 22 '21 at 23:42

Bad rust code optimization or I just haven't done enough? (Euler #757)

1 Answers1