I have a file say file1.rule
, which has even number of rows, the last column of that file represent fitness
and the second last column represent the class
. I want to pair the rows class wise(first picks the row with the highest fitness then a random one from the remaining), with just one condition that no two identical rows can form a pair. In my file, no two exactly identical row for a class can occur more than n/2
times where n
is the number of rows for that particular class.
Below is my file:
*,*,*,1,0,1.0
*,*,1,*,0,0.22
*,*,2,2,1,0.71
*,*,2,2,1,0.71
*,2,2,*,1,0.64
*,2,2,*,1,0.64
1,*,*,3,2,0.95
*,*,3,2,2,0.66
*,*,3,4,2,0.67
3,*,*,*,2,0.33
3,*,*,*,2,0.33
3,*,*,*,2,0.33
And the code for this :
rule_file_name = "file1.rule"
from collections import defaultdict
list1 = []
with open(rule_file_name) as rule_fp:
for line in rule_fp.readlines():
list1.append(line.replace("\n","").split(","))
assert len(list1) & 1 == 0
classes = defaultdict(list)
for _list in list1:
classes[_list[4]].append(_list)
from random import sample, seed
seed(1)
for key, _list in classes.items():
assert len(_list) & 1 == 0
_list.sort(key=lambda x: x[5])
pairs = []
#while(len(_list)>2):
while _list:
#print(len(_list))
first = _list[-1]
candidate = sample(_list, 1)[0]
if first != candidate:
#print(f'first{first}, candidate{candidate}')
print(f'{first},{candidate}')
pairs.append((first, candidate))
_list.remove(first)
_list.remove(candidate)
classes[key] = pairs
The above code is working fine for class 0 and 1 and pairing is done but for class 2, the first 2 randomly chosen pairs are :
['1', '*', '*', '3', '2', '0.95'],['*', '*', '3', '2', '2', '0.66']
['*', '*', '3', '4', '2', '0.67'],['3', '*', '*', '*', '2', '0.33']
Now after these the remaining 2 rows of class 2 are: 3,*,*,*,2,0.33 and 3,*,*,*,2,0.33
which are identical so they can't form a pair and hence the while loop is running for infinite times.
According to my observation, this condition will only arrive when there are only last 2 rows left for any class, in this case, I simply want to discard those 2 rows. So I tried to replace the while condition writing: while(len(_list)>2):
, but in this case the last 2 will always be ignored even if they are completely different from each other. What to do?
Can I use any timer inside the while loop like below?
if some_condition or time.time() > timeout:
break
I tried to modify the code like this also:
while _list:
first = _list[-1]
_list.remove(first)
candidate = sample(_list, 1)[0]
if (len(_list)<=2) and first == candidate:
break
elif first != candidate:
#print(f'first{first}, candidate{candidate}')
print(f'{first},{candidate}')
pairs.append((first, candidate))
#_list.remove(first)
_list.remove(candidate)
classes[key] = pairs
But in this,when my file looks like below:
*,2,2,*,1,0.64
*,*,2,2,1,0.71
*,*,2,2,1,0.71
*,2,2,*,1,0.64
It is selecting ['*', '*', '2', '2', '1', '0.71'],['*', '2', '2', '*', '1', '0.64']
then I am getting error in candidate = sample(_list, 1)[0]
this line saying: ValueError: Sample larger than population or is negative
. Please help me out.