Pairing rows of a file almost randomly

Question

I have a file say file1.rule, which has even number of rows, the last column of that file represent fitness and the second last column represent the class. I want to pair the rows class wise(first picks the row with the highest fitness then a random one from the remaining), with just one condition that no two identical rows can form a pair. In my file, no two exactly identical row for a class can occur more than n/2 times where n is the number of rows for that particular class.

Below is my file:

*,*,*,1,0,1.0
*,*,1,*,0,0.22
*,*,2,2,1,0.71
*,*,2,2,1,0.71
*,2,2,*,1,0.64
*,2,2,*,1,0.64
1,*,*,3,2,0.95
*,*,3,2,2,0.66
*,*,3,4,2,0.67
3,*,*,*,2,0.33
3,*,*,*,2,0.33
3,*,*,*,2,0.33

And the code for this :

rule_file_name = "file1.rule"
from collections import defaultdict
list1 = []

with open(rule_file_name) as rule_fp:
    for line in rule_fp.readlines():
        list1.append(line.replace("\n","").split(","))

assert len(list1) & 1 == 0
classes = defaultdict(list)
for _list in list1:
    classes[_list[4]].append(_list)

    
from random import sample, seed
seed(1)
for key, _list in classes.items():
    assert len(_list) & 1 == 0
    _list.sort(key=lambda x: x[5])
    pairs = []
    #while(len(_list)>2):
    while _list:
        #print(len(_list))
        first = _list[-1]
        candidate = sample(_list, 1)[0]
        if first != candidate:
            #print(f'first{first}, candidate{candidate}')
            print(f'{first},{candidate}')
            pairs.append((first, candidate))
            _list.remove(first)
            _list.remove(candidate)
    classes[key] = pairs

The above code is working fine for class 0 and 1 and pairing is done but for class 2, the first 2 randomly chosen pairs are :

['1', '*', '*', '3', '2', '0.95'],['*', '*', '3', '2', '2', '0.66']
['*', '*', '3', '4', '2', '0.67'],['3', '*', '*', '*', '2', '0.33']

Now after these the remaining 2 rows of class 2 are: 3,*,*,*,2,0.33 and 3,*,*,*,2,0.33 which are identical so they can't form a pair and hence the while loop is running for infinite times.

According to my observation, this condition will only arrive when there are only last 2 rows left for any class, in this case, I simply want to discard those 2 rows. So I tried to replace the while condition writing: while(len(_list)>2): , but in this case the last 2 will always be ignored even if they are completely different from each other. What to do?

Can I use any timer inside the while loop like below?

if some_condition or time.time() > timeout:
        break

I tried to modify the code like this also:

while _list:
    first = _list[-1]
    _list.remove(first)
    candidate = sample(_list, 1)[0]
    if (len(_list)<=2) and first == candidate:
        break
    elif first != candidate:
        #print(f'first{first}, candidate{candidate}')
        print(f'{first},{candidate}')
        pairs.append((first, candidate))
        #_list.remove(first)
        _list.remove(candidate)
classes[key] = pairs

But in this,when my file looks like below:

*,2,2,*,1,0.64
*,*,2,2,1,0.71
*,*,2,2,1,0.71
*,2,2,*,1,0.64

It is selecting ['*', '*', '2', '2', '1', '0.71'],['*', '2', '2', '*', '1', '0.64'] then I am getting error in candidate = sample(_list, 1)[0] this line saying: ValueError: Sample larger than population or is negative. Please help me out.

"bitwise and" with 1 return 0 for even values. This is correct for your logic? — Eugene, Apr 10 '22 at 03:04
@Eugene Yes that thing is correct because I will always have even number of rows in total and each class will also have even number of rows. — Dev, Apr 10 '22 at 03:12
Does this answer your question? [Fast randomly-paired permutation](https://stackoverflow.com/questions/65361557/fast-randomly-paired-permutation) — Sneftel, Apr 10 '22 at 03:12
Error for second version becouse you want get value from empty list. In past iteration your list have one element, but checking length of list stay letter sample calling — Eugene, Apr 10 '22 at 03:13
Your code remove last element in list in ```_list.remove(first)``` and call sample for empty list. Becouse in list odd values and on each iteration remove pair of elements. — Eugene, Apr 10 '22 at 03:17
Can you please give me one solution? I am new to programming and python hence facing difficulty. — Dev, Apr 10 '22 at 03:19
Check condition with length before remove element from list. — Eugene, Apr 10 '22 at 03:20
`if (len(_list)<=2) and first == candidate: break _list.remove(first) elif first != candidate:` are you saying something like this? But I think this will throw an error. — Dev, Apr 10 '22 at 03:26

score 1 · Accepted Answer · answered Apr 10 '22 at 03:48

This might be one direction to go in:

# ...
# construct the defaultdict classes as before.

from collections import Counter
from random import seed, choice


def attempt_partition_into_pairs(rows):
    rows.sort(key=lambda x: x[5])
    # convert to tuples so that the rows are hashable.
    rows = [tuple(row) for row in rows]
    # convert to a Counter so that we organize duplicates
    counter = Counter(rows)

    pairs = []

    # ensure there are always at least two distinct rows
    while len(counter) >= 2:
        # choose and remove first
        first = rows.pop()
        counter[first] -= 1
        if counter[first] == 0:
            del counter[first]

        # choose candidate
        while True:
            candidate = choice(rows)
            if candidate != first:
                break
        counter[candidate] -= 1
        if counter[candidate] == 0:
            del counter[candidate]
        rows.remove(candidate)

        # output
        assert candidate != first
        print(f'{first},{candidate}')
        pairs.append((first, candidate))

    return pairs


seed(1)
for key, _list in classes.items():
    assert len(_list) & 1 == 0
    classes[key] = attempt_partition_into_pairs(_list)

This prints:

('*', '*', '*', '1', '0', '1.0'),('*', '*', '1', '*', '0', '0.22')
('*', '*', '2', '2', '1', '0.71'),('*', '2', '2', '*', '1', '0.64')
('*', '*', '2', '2', '1', '0.71'),('*', '2', '2', '*', '1', '0.64')
('1', '*', '*', '3', '2', '0.95'),('3', '*', '*', '*', '2', '0.33')
('*', '*', '3', '4', '2', '0.67'),('3', '*', '*', '*', '2', '0.33')
('*', '*', '3', '2', '2', '0.66'),('3', '*', '*', '*', '2', '0.33')

I've taken the liberty to factor out that function. Using a collections.Counter helps keeps track of when there are duplicates. Also you can you random.choice() rather than random.sample(..., 1).

This technically retains some O(n^2) behavior at the rows.remove(candidate) call, so it may be a bit slow if your input file is huge, but certainly no slower than your original attempt. You can probably fix that with some more cleverness, if need be.

This looks fine and got the concept as well. Thanks. – Dev Apr 10 '22 at 09:41 — Dev, Apr 10 '22 at 09:41

Pairing rows of a file almost randomly

1 Answers1