0

I am working on little tasks to improve my coding and efficiency, the problem I'm working on today is from project Euler, problem 3:

"Find the largest Prime of 600851475143"

the code I have written is:

data test;

a = 600851475143;
/*The subsequent a's are the next parts of the loop I'm trying to incorporate*/
/*a = 8462696833;*/
/*a = 10086647;*/
/*a = 6857;*/

if mod(a,2) = 0 then do;

a = a / 2;

end;

else do;

    do i = 3 to a until(flag);

        if mod(a,i) = 0 and i < a then do;

            b = i ;
            a = a / b ;
            flag = 1;
            output;

        end;
    end;

end;

run;

How do I make the variable a loop and get smaller and then terminate when there is no more a, i.e. the last iteration does not produce a dataset because there is no factorisation.

I am also happy to receive any tips on how to make this code more efficient because I am trying to learn

78282219
  • 593
  • 5
  • 21

1 Answers1

2

You will need some inner-looping to remove all powers of each factor.

Some issues with factor checking

  • The power of 2 removal only removes the first of potentially many powers of 2 in the factorization.
  • So does the do i loop (which is really the factor).
  • The do i iterates by 1, which means you are checking even numbers. This does not need to be done after removing the 2 factors -- do i=3 to a by 2 … would be better
    • the upper limit of a numbers prime factor is sqrt(number)
    • if you don't want to compute sqrt, you can use number/2
    • regardless of the to you will be exiting the factorization loop when the number is reduced to 1, so to a is ok.
    • a smarter 'solver' will track dynamically known primes and test only those. If complete factorization is not achieved after checking the last known prime, you have to advance by 2

It makes sense to use variable names that correspond to their role in the solution -- so instead of i consider using factor. Certainly for personal code, you can use what names you want, but for code that will be maintained by you or others in the future the best practice is good variable names.

In SAS, instead of hard-coding a single test value at the top of the DTA step, consider processing (prime factorizing) any number of numbers that are contained in a data set.

Looping in SAS is done with a variety of do code constructs

  • do … while(condition); …iterated-statements… end;
    • 0 or more loops - condition test is done before doing any iterate-statements
  • do … until(condition); … end;
    • 1 or more loops - condition test is done after doing iterated-statements
  • do index=from-value to to-value by by-value; … end;
    • index varies by equal steps and can be used as part of iterated-statements
    • to-value is computed once and can not be changed by the iterated-statements
    • while or until can be tacked onto do index= statement
    • to-value is optional, however will endless loop unless while or until is present, or at least one of the iterated-statements performs a leave statement

which you choose depends on the problem.

Make a data set of numbers to process

data numbers; input number; format number 16.; datalines;
64
720
30
600851475143
8462696833
10086647
6857
run;

Sample code

Inner-looping is used to remove factors that occur more than once.

Changed: Inner-inner looping (= -1, 1) is used to apply 6n+/-1 is may well be prime theorem to select possible factors.

data prime_factorizations(keep=number factor power);

  set numbers;
  objective = floor(abs(number));

  factor = 2;
  do power = 0 by 1 while (mod(objective,factor) = 0);
    objective = objective / factor;
  end;
  if power then output;

  factor = 3;
  do power = 0 by 1 while (mod(objective,factor) = 0);
    objective = objective / factor;
  end;
  if power then output;

  * after 2 and 3 all primes are of form 6n +/- 1;
  * however, not all 6n +/- 1 are prime;

  * in essence operate a sieve to check for factors;
  * of course the best sieve is a list of primes,
  * but 6n +/- 1 knocks out a lot of unnecessary checks; 

  do n = 1 to objective while (objective > 1);
    do offset = -1, 1;
      factor = 6*n + offset;
      do power = 0 by 1 while (mod(objective,factor) = 0);
        objective = objective / factor;
      end;
      if power then OUTPUT;
      if objective = 1 then leave;
    end;
  end;
run;

title "Prime factorization of some numbers";
proc print noobs data=prime_factorizations;
run;

proc sql;
  title "Max prime factor of various numbers";
  select number, max(factor) as max_prime_factor
  from prime_factorizations group by number;
quit;

title;

Of course this is not how actual large prime number seekers operate, but is a good introduction to programming in any given language. Back in CS "survey of languages" I would try to code the above in the new language being learned.

Richard
  • 25,390
  • 3
  • 25
  • 38
  • My largest question for my code is regarding the self fulfilling of the variable a as above you have defined a dataset "numbers" with the factors of the large number but in the loop, I am looking for these numbers and wish to restart a loop with the new value for a, would it be possible to highlight this amendment? – 78282219 Jul 11 '18 at 12:50
  • another little feature i'm aiming to incorporate is that all primes after 3 follow the 6n+-1, will create the dataset to create those numbers after 3 and incorporate them in the loop to save time – 78282219 Jul 11 '18 at 12:51
  • You do not need a separate data set with the 6n +/- 1 prime candidates. They can be computed dynamically at the factor loop level. See changes – Richard Jul 11 '18 at 15:24
  • Is SAS when the bottom of the DATA step is reached control returns implicitly (silently) back to the top of the step. Thus, the SET statement is reached again, at which point the next record is read from the `numbers` data set. – Richard Jul 11 '18 at 16:10
  • The numbers data set is simply a list of any numbers you want factorized. The data step output `prime_factorizations` has one row per number prime factor, so a prime number will have only one row in the output, and composite numbers will have multiple rows -- the proc print should make it obvious. Try changing the numbers to factor to see how they are factorized. – Richard Jul 11 '18 at 16:45