Is there a better way to find ‘highly composite’ pythagorean triples in Python?

Question

I’m trying to find ‘highly composite’ pythagorean triples - numbers (c) that have more than one unique a,b (in the naturals) that satisfy a² + b² = c².

I’ve written a short python script to find these - it cycles through c in the range (0,1000), and for each c, finds all possible (a,b) such that b < a < c. This is a more brute force method, and I know if I did some reading on number theory I could find some more methods for different cases of a and b.

I have a feeling that my script isn’t particularly efficient, especially for large c. I don’t really know what to change or how to make it more efficient.

I’d be really grateful for any help or pointers!

a = 0 
b = 0
l=[]
for i in range (0,1000):
#i is our c.
    while a<i:
        while b<a:

        #for each a, we cycle through b = 1, b = 2, … until b = a. 
        #Then we make b = 0 and a = a+1, and start the iterative process again.

            if a*a + b*b == i*i:
                l.append(a)
                l.append(b)

                #I tried adding a break here - my thought process was that we can’t find any 
                #other b^2 that satisfies a^2 + b^2 = i^2 without changing our a^2. This 
                #actually made the runtime longer, and I don’t know why.

            b = b+1

        a = a+1
        b = 0

    if len(l) > 4:

        #all our pairs of pythagorean triples, with the c at the end.
        print(l, i)
    
    #reset, and find pairs again for i = i+1.
    l = []
    b = 0
    a = 0

Mateo Vial · Answer 1 · 2022-03-20T05:42:10.310

Your code seems quite inefficient, because you are doing many times the same computations. You could make it more efficient by not calculating things that are not useful. The most important detail is the computation of a and b. You are looping through all possible values for a and b and checking if it's a pythagorean triplet. But once you give yourself a value for a, there is only one possible choice for b, so the b loop is useless.

By removing that loop, you're basically lowering the degree of the polynomial complexity by one, which will make it increasingly faster (compared to your current script) when c grows

Also, your code seems to be wrong, as it misses some triplets. I ran it and the first triplets found were with 65 and 85, but 25, 50 and 75 are also highly composite pythagoren triplets. That's because you're checking len(l)>4, while you should check len(l)>=4 instead because you're missing numbers that have two decompositions.

As a comparison, I programmed a similar python script as yours (except I did it myself and tried to make it as efficient as possible). On my computer, your script ran in 66 seconds, while mine ran in 4 seconds, so you have a lot of room for improvement.

EDIT : I added my code for the sake of sharing. Here is a list of what differs from yours :

I stored all squares of numbers from 1 to N in a list called squares so I can check efficiently if a number is a square
I store the results in a dictionary where the value at key c is a list of tuples corresponding to (a, b)
The loop for a goes from 1 to floor(c/sqrt(2))
Instead of looping for b, I check whether c²-a² is a square
On a general note, I pre-compute every value that has to be used several times (invsqrt2, csqr)

from math import floor, sqrt

invsqrt2 = 1/sqrt(2)
N=1000
highly_composite_triplets = {}
squares = list(map(lambda x: x**2, range(0,N+1)))

for c in range(2,N+1):
    if c%50==0: print(c) # Just to keep track of the thing
    csqr = c**2
    listpairs = []
    for a in range(1,floor(c*invsqrt2)+1):
        sqrdiff = csqr-a**2
        if sqrdiff in squares:
            listpairs.append((a, squares.index(sqrdiff)))
    if len(listpairs)>1:
        highly_composite_triplets[c] = listpairs
        
print(highly_composite_triplets)

Thank you very much! This is helpful. Eliminating the need for b with some algebra is a nice touch. — dangerscott, Mar 19 '22 at 14:32

trincot · Answer 2 · 2022-03-20T22:02:06.717

First of all, and as already mentioned, you should fix that > 4 by >= 4.

For performance, I would suggest using the Tree of primitive Pythagorean triples. It allows to generate all possible primitive triples, such that three "children" of a given triple have a c-value that is at least as great as the one of the "parent".

The non-primitive triples can be easily generated from a primitive one, by multiplying all three values with a coefficient (until the maximum value of c is reached). This has to only be done for the initial triplet, as the others will follow from it.

That is the part where most efficiency gain is made.

Then in a second phase: group those triples by their c value. You can use itertools.groupby for that.

In a third phase: only select the groups that have at least 2 members (i.e. 4 values).

Here is an implementation:

import itertools
import operator

def pythagorian(end):
    # DFS traversal through the pythagorian tree:
    def recur(a, b, c):
        if c < end:
            yield c, max(a, b), min(a, b)
            yield from recur( a - 2*b + 2*c,  2*a - b + 2*c,  2*a - 2*b + 3*c)
            yield from recur( a + 2*b + 2*c,  2*a + b + 2*c,  2*a + 2*b + 3*c)
            yield from recur(-a + 2*b + 2*c, -2*a + b + 2*c, -2*a + 2*b + 3*c)
    # Start traversal from basic triplet, and its multiples
    for i in range(1, end // 5):
        yield from recur(4*i, 3*i, 5*i)  

def grouped_pythagorian(end):
    # Group by value of c, and flatten the a, b pairs into a list
    return [
        (c, [a for _, *ab in group for a in ab])
        for c, group in itertools.groupby(sorted(pythagorian(end)), 
                                          operator.itemgetter(0))
    ]

def highly_pythagorian(end):
    # Select the groups of triples that have at least 2 members (i.e. 4 values)
    return [(group, c) for c, group in grouped_pythagorian(end) if len(group) >= 4]

Run the function as follows:

for result in highly_pythagorian(1000):
    print(*result)

This produces the triples within a fraction of a second, and is thousands of times faster than your version and the one in @Mateo's answer.

Simplified

As discussed in comments, I provide here code that uses the same algorithm, but without imports, list comprehensions, generators (yield), and unpacking operators (*):

def highly_pythagorian(end):
    triples = []
    
    # DFS traversal through the pythagorian tree:
    def dfs(a, b, c):
        if c < end:
            triples.append((c, max(a, b), min(a, b)))
            dfs( a - 2*b + 2*c,  2*a - b + 2*c,  2*a - 2*b + 3*c)
            dfs( a + 2*b + 2*c,  2*a + b + 2*c,  2*a + 2*b + 3*c)
            dfs(-a + 2*b + 2*c, -2*a + b + 2*c, -2*a + 2*b + 3*c)
            
    # Start traversal from basic triplet, and its multiples
    for i in range(1, end // 5):
        dfs(4*i, 3*i, 5*i)
    
    # Sort the triples by their c-component (first one),
    #     ...and then their a-component
    triples.sort()

    # Group the triples in a dict, keyed by c values
    groups = {}
    for c, a, b in triples:
        if not c in groups:
            groups[c] = []
        groups[c].append(a)
        groups[c].append(b)

    # Select the groups of triples that have at least 2 members (i.e. 4 values)
    results = []
    for c, ab_pairs in sorted(groups.items()):
        if len(ab_pairs) >= 4:
            results.append((ab_pairs, c))
    return results

Call as:

for ab_pairs, c in highly_pythagorian(1000):
    print(ab_pairs, c)

Thanks for the response. Looking at what the script produces, I wonder why the max number of pairs is 7? Do the number of pairs generally increase for bigger c? The techniques you have used seem to be a but more advanced than the ones I have at my disposal at the moment. Not too sure how you want me to interpret the last sentence though - it comes across slightly egotistic. — dangerscott, Mar 19 '22 at 14:40
Oh no, I don't want to come across egotistic. From my experience on Stack Overflow, I see that people are looking for efficient solutions. If that is less of a priority for you, then please ignore my answer. It was a joy for me to look into this challenge, so anyway thanks for having asked this question! There is no limit for the number of triangles that can make up the same sum. Just increase the argument given to the function, and you'll see that such larger lists will get produced. — trincot, Mar 19 '22 at 16:42
About the techniques: could you let me know which techniques are not at your disposal? If possible I will then try to do it without those. Again, I liked this question (and upvoted it when I found it) — trincot, Mar 19 '22 at 17:11
I’m at best an amateur programmer, I study maths, not computer science! Not much of what you’ve written in the script makes sense to me, as I don’t know the definitions of the functions you’re calling, and so on. Thanks again for the answer :) — dangerscott, Mar 20 '22 at 21:32
OK, I will try to extend my answer using as few imports as possible and avoiding any list comprehension syntax... — trincot, Mar 20 '22 at 21:40
I have added a version of the same algorithm, but using basic Python operations only (I hope). Let me know if there is something that needs clarification. — trincot, Mar 20 '22 at 22:02

dan_fulea · Answer 3 · 2022-06-18T22:00:08.643

Here is a solution based on the mathematical intuition behind Gaussian integers. We are working in the "ring" R of all numbers of the form

a + ib

where a, b are integers. This is the ring of Gaussian integers. Here, i is the square root of -1. So i² = -1. Such numbers lead to a similar arithmetic as in the case of the (usual) integers. Each such number has a unique decomposition in gaussian primes. (Up to the order of the factors.) Such a domain is called a unique factorization domain, UFD.

Which are the primes in R? (Those elements that cannot be split multiplicatively in more than two non-invertible pieces.) There is a concrete characterization for them. The classical primes of the shapes 4k + 3 remain primes in R, are inert. So we cannot split primes like 3, 7, 11, 19, 23, 31, ... in R. But we can always split uniquely (up to unit conjugation, a unit being one among 1, -1, i, -i) the (classical) primes of the shape 4k + 1 in R. For instance:

(*)

5  = (2 +  i)(2 -  i)
13 = (3 + 2i)(3 - 2i)
17 = (4 +  i)(4 -  i)
29 = (5 + 2i)(5 - 2i)
37 = (6 +  i)(6 -  i)
41 = (5 + 4i)(5 - 4i)
53 = (7 + 2i)(7 - 2i)
61 = (6 + 5i)(6 - 5i)

and so on, i hope the scheme is clear. For our purpose, the remained prime two is the oddest prime. Since we have its decomposition 2 = (1 + i)(1 -i), where the two Gaussian primes (1 + i) and (1 - i) are associated, multiplying with a unit bring one in the other one. I will avoid this prime below.

Now consider the product of some of the numbers on the L.H.S. in (*). For instance 5.5.13.17 = 5525 - and let us pick from each of the four (classical) prime factors one of the Gaussian primes inside. We may thus pick (2 + i) twice from the two 5-factors, (3 - 2i) from 13 and (4 + i) from the 17. We multiply and get:

sage: (2 + i)^2 * (3 - 2*i) * (4 + i)
41*I + 62

And indeed, a = 41 and b = 62 is a solution of 41² + 62² = 5525. Unfortunately 5525 is not a square. OK, let us start with a square, one like

1105² = 5².13².17² = (2+i)²(2-i)² . (3+2i)²(3-2i)² . (4+i)²(4-i)²

and now separate the factors in "two parts", so that in one part we have some factors, and in the other part the conjugates. Here are the possibilities for 25 = 5²:

(2+i)²    and    (2-i)²
    5     and        5
(2-i)²    and    (2+i)²

There are three possibilities. Do the same for the other two squares, then combine. For instance:

sage: (2 + i)^2 * (3 - 2*i)^2 * 17
-272*I + 1071

And indeed, 272² + 1071² = 1105² . This solution is not "primitive", in the sense that 17 is a divisor of the three involved numbers, 272, 1071, 1105. Well, this happens because we took the factor 17 from the separation of 17² in two (equal) parts. To get some other solutions, we may take

each possible first part from 5² with...
each possible first part from 13² with...
each possible first part from 17² and thus get "many solutions". Here are they:

sage: [ (m, n) for m in range(1, 1105) for n in range(1, 1105) ....: if m <= n and m2 + n2 == 1105**2 ]

[(47, 1104),
 (105, 1100),
 (169, 1092),
 (264, 1073),
 (272, 1071),
 (425, 1020),
 (468, 1001),
 (520, 975),
 (561, 952),
 (576, 943),
 (663, 884),
 (700, 855),
 (744, 817)]

We expect 3.3.3 solutions. One of them is the trivial one, 1105² = 1105² + 0². The other solutions of 1105² = a² + b² may be arranged to have a < b. (No chance to get equality.) So we expect (27 - 1)/2 = 13 solutions, yes, the ones above. Which solution is produced by taking the "first parts" as follows: (2 + i)^2 * (3 - 2*i)^2 * (4 + i)^2 ?!

sage: (2 + i)^2 * (3 - 2*i)^2 * (4 + i)^2
264*I + 1073

And indeed, (264, 1073) is among the solutions above.

So if getting "highly composite" numbers is the issue, with an accent on highly, then just pick for c such a product of primes of the shape 4k + 1. For instance c = 5³.13.17 or c = 5.13.17.29. Then compute all representations c² = (a + ib)(a - ib) = a² + b² best by using the UFD property of the Gaussian integers.

For instance, in a python3 dialog with the interpreter...

In [16]: L25 = [complex(2, 1)**4, complex(2, 1)**2 * 5, 25, complex(2, -1)**2 * 5, complex(2, -1)**4]
In [17]: L13 = [complex(3, 2)**2, 13, complex(3, -2)**2]
In [18]: L17 = [complex(4, 1)**2, 17, complex(4, -1)**2]

In [19]: solutions = []
In [20]: for z1 in L25:
    ...:     for z2 in L13:
    ...:         for z3 in L17:
    ...:             z = z1 * z2 * z3
    ...:             a, b = int(abs(z.real)), int(abs(z.imag))
    ...:             if a > b:
    ...:                 a, b = b, a
    ...:             solutions.append((a, b))
    ...: 
In [21]: solutions = list(set(solutions))
In [22]: solutions.sort()

In [23]: len(solutions)
Out[23]: 23

In [24]: solutions
Out[24]: 
[(0, 5525),
 (235, 5520),
 (525, 5500),
 (612, 5491),
 (845, 5460),
 (1036, 5427),
 (1131, 5408),
 (1320, 5365),
 (1360, 5355),
 (1547, 5304),
 (2044, 5133),
 (2125, 5100),
 (2163, 5084),
 (2340, 5005),
 (2600, 4875),
 (2805, 4760),
 (2880, 4715),
 (3124, 4557),
 (3315, 4420),
 (3468, 4301),
 (3500, 4275),
 (3720, 4085),
 (3861, 3952)]

We have 23 = 22 + 1 solutions. The last one is the trivial one. All other solutions (a, b) listed have a < b, so there are totally 1 + 22*2 = 45 = 5 * 3 * 3, as expected from the triple for loop above. A similar code can be written for c = 5 * 13 * 17 * 29 = 32045 leading to (3^4 - 1)/2 = 40 non-trivial solutions.

In [26]: L5  = [complex(2, 1)**2,  5, complex(2, -1)**2]
In [27]: L13 = [complex(3, 2)**2, 13, complex(3, -2)**2]
In [28]: L17 = [complex(4, 1)**2, 17, complex(4, -1)**2]
In [29]: L29 = [complex(5, 2)**2, 29, complex(5, -2)**2]

In [30]: z_list = [z1*z2*z3*z4
    ...:           for z1 in L5  for z2 in L13
    ...:           for z3 in L17 for z4 in L29]

In [31]: ab_list = [(int(abs(z.real)), int(abs(z.imag))) for z in z_list]

In [32]: len(ab_list)
Out[32]: 81

In [33]: ab_list = list(set([(min(a, b), max(a, b)) for (a, b) in ab_list]))

In [34]: ab_list.sort()

In [35]: len(ab_list)
Out[35]: 41

In [36]: ab_list[:10]
Out[36]: 
[(0, 32045),
 (716, 32037),
 (1363, 32016),
 (2277, 31964),
 (2400, 31955),
 (3045, 31900),
 (3757, 31824),
 (3955, 31800),
 (4901, 31668),
 (5304, 31603)]

(Feel free to also use powers of two in c.)

score -1 · Answer 4 · edited Mar 17 '22 at 17:49

-1

#There is a general formula for pythagoran triples
take 2 numbers, m & n where m > n
    a = (m^2) - (n^2)
    b = 2mn
    c = (m^2) + (n^2)

That will always give you a pythagoran triple. Its more efficient but it might not be what you're looking for.

edited Mar 17 '22 at 17:49

Wai Ha Lee

8,598
83
57
92

answered Mar 17 '22 at 11:29

mister degemensde

1

Is there a better way to find ‘highly composite’ pythagorean triples in Python?

4 Answers4

Simplified