4

I am trying to get the longest list of a set of five ordered position, 1 to 5 each, satisfying the condition that any two members of the list cannot share more than one identical position (index). I.e., 11111 and 12222 is permitted (only the 1 at index 0 is shared), but 11111 and 11222 is not permitted (same value at index 0 and 1).

I have tried a brute-force attack, starting with the complete list of permutations, 3125 members, and walking through the list element by element, rejecting the ones that do not match the criteria, in several steps:

  • step one: testing elements 2 to 3125 against element 1, getting a new shorter list L'
  • step one: testing elements 3 to N' against element 2', getting a shorter list yet L'',

and so on.

I get a 17 members solution, perfectly valid. The problem is that:

  • I know there are, at least, two 25-member valid solution found by a matter of good luck,
  • The solution by this brute-force method depends strongly on the initial order of the 3125 members list, so I have been able to find from 12- to 21-member solutions, shuffling the L0 list, but I have never hit the 25-member solutions.

Could anyone please put light on the problem? Thank you.

This is my approach so far

import csv, random 

maxv = 0
soln=0

for p in range(0,1): #Intended to run multiple times 

    z = -1  

    while True:

        z = z + 1

        file1 = 'Step' + "%02d" % (z+0) + '.csv'
        file2 = 'Step' + "%02d" % (z+1) + '.csv'

        nextdata=[]

        with open(file1, 'r') as csv_file:
            data = list(csv.reader(csv_file))


        #if file1 == 'Step00.csv':  # related to p loop
        #    random.shuffle(data)


        i = 0
        while i <= z:        
            nextdata.append(data[i])        
            i = i + 1


        for j in range(z, len(data)):

            sum=0
            for k in range(0,5):

                if (data[z][k] == data[j][k]):
                    sum = sum + 1

            if sum < 2:
                nextdata.append(data[j])


        ofile = open(file2, 'wb')
        writer = csv.writer(ofile)
        writer.writerows(nextdata) 
        ofile.close()

        if (len(nextdata) < z + 1 + 1):
            if (z+1)>= maxv:
                maxv = z+1
                print maxv
                ofile = open("Solution"+"%02d" % soln + '.csv', 'wb')
                writer = csv.writer(ofile)
                writer.writerows(nextdata) 
                ofile.close()
            soln = soln + 1
            break
desertnaut
  • 57,590
  • 26
  • 140
  • 166
rbenit68
  • 75
  • 3
  • 11111, 11112, 11113, ...., 55554, 55555 is a list of 3125 elements – rbenit68 May 30 '20 at 09:19
  • @N.Wouda It's 5**5. – Błotosmętek May 30 '20 at 09:22
  • @rbenit68 your idea seems reasonable, maybe it's just an implementation error? Please post your code and let's see if it does what you think it should… – Błotosmętek May 30 '20 at 09:26
  • I am trying to get the members that match the condition, e.g., 11111, 12345, 13524, 14253, 15432, 21543, 22222, 23451, 24135, 25314, 31425, 32154, 33333, 34512, 35241, 41352, 42531, 43215, 44444, 45123, 51234, 52413, 53142, 54321, 55555 – rbenit68 May 30 '20 at 09:43
  • Could you please explain this: `I know there are, at least, two 25-member valid solution found by a matter of good luck,` – Balaji Ambresh May 30 '20 at 09:56
  • Why on earrh are you using a file?! This must slow down your code by whole orders of magnitude… – Błotosmętek May 30 '20 at 09:59
  • Not much, actually "Program exited with code #0 after 0.15 seconds." – rbenit68 May 30 '20 at 10:02
  • "I know there are, at least, two 25-member valid solution found by a matter of good luck". The explanation is that the original starting list is: 11111, 11112, 11113, ...., 55554, 55555. But if I modify the list making it begins 11111, 22222, 33333, 44444, 55555, 12345, 23451, 34512, 45123, 51234, 11112, ... then I can get the 25 members solutions. I said "The solution by this brute-force method depends strongly on the initial order of the 3125 members list (...) I have been able to find from 12- to 21-member solutions" – rbenit68 May 30 '20 at 10:09
  • @rbenit68 0.15s only because you're not testing all cases (and therefore not finding the best one). – Błotosmętek May 30 '20 at 10:14
  • So you are telling me that I should test factorial(3125)? Is that reasonably posible with a home computer? – rbenit68 May 30 '20 at 10:17
  • No, not possible in reasonable time, on any single computer. Fortunately there's no need to. – Błotosmętek May 30 '20 at 11:14
  • What do you mean by '5 ordered positions'? If the digits are in nondecreasing order (e.g. 54321 is invalid) then there are only 126 instead of 3125. – Dave May 30 '20 at 15:11
  • 54321 is valid and compatible with 12345 (only 1 figure (3) matches, at the same position) – rbenit68 May 30 '20 at 15:39
  • You may look at the collection of your numbers as a graph, with the numbers as vertices, and the edges connecting numbers only if they share no more than one identical position. Then, what you are looking for is a maximum clique. [This article](https://en.wikipedia.org/wiki/Clique_problem) is a good starting point. – user58697 May 30 '20 at 16:03

1 Answers1

3

Here is a Picat model for the problem (as I understand it): http://hakank.org/picat/longest_subset_of_five_positions.pi It use constraint modelling and SAT solver.

Edit: Here is a MiniZinc model: http://hakank.org/minizinc/longest_subset_of_five_positions.mzn

The model (predicate go/0) check lengths of 2 to 100. All lengths between 2 and 25 has at least one solution (probably at lot more). So 25 is the longest sub sequence. Here is one 25 length solution:

{1,1,1,3,4}
{1,2,5,1,5}
{1,3,4,4,1}
{1,4,2,2,2}
{1,5,3,5,3}
{2,1,3,2,1}
{2,2,4,5,4}
{2,3,2,1,3}
{2,4,1,4,5}
{2,5,5,3,2}
{3,1,2,5,5}
{3,2,3,4,2}
{3,3,5,2,4}
{3,4,4,3,3}
{3,5,1,1,1}
{4,1,4,1,2}
{4,2,1,2,3}
{4,3,3,3,5}
{4,4,5,5,1}
{4,5,2,4,4}
{5,1,5,4,3}
{5,2,2,3,1}
{5,3,1,5,2}
{5,4,3,1,4}
{5,5,4,2,5}

There is a lot of different 25 lengths solutions (the predicate go2/0 checks that).

Here is the complete model (edited from the file above):

import sat.
main => go.

%
% Test all lengths from 2..100.
% 25 is the longest.
%
go ?=>
  nolog,
  foreach(M in 2..100)
  println(check=M),
  if once(check(M,_X)) then
    println(M=ok)
  else
    println(M=not_ok)
  end,
  nl
end,
nl.

go => true.


%
% Check if there is a solution with M numbers
% 
check(M, X) =>
  N = 5,
  X = new_array(M,N),
  X :: 1..5,

  foreach(I in 1..M, J in I+1..M)
    % at most 1 same number in the same position
    sum([X[I,K] #= X[J,K] : K in 1..N]) #<= 1, 
    % symmetry breaking: sort the sub sequence
    lex_lt(X[I],X[J])
  end,

 solve([ff,split],X),

 foreach(Row in X)
   println(Row)
 end,
 nl.
hakank
  • 6,629
  • 1
  • 17
  • 27
  • _probably at lot more_ - indeed, at least 5! = 120 for any given length (they can be derived from a single one by way of simple mapping). – Błotosmętek May 30 '20 at 20:24
  • @Błotosmętek There are 236544 different solutions of length 2 sub sequences and 167961600 solutions of length 3. Note that I require that the sub sequences are in lexicographic order. I'm still waiting for the number of solutions of length 4 sub sequences. – hakank May 30 '20 at 21:21