I am working on a problem where I have to group related items and assign a unique id to them. I have written the code in python but it is not returning the expected output. I need assistance in refining my logic. The code is below:
data = {}
child_list = []
for index, row in df.iterrows():
parent = row['source']
child = row['target']
#print 'Parent: ', parent
#print 'Child:', child
child_list.append(child)
#print child_list
if parent not in data.keys():
data[parent] = []
if parent != child:
data[parent].append(child)
#print data
op = {}
gid = 0
def recursive(op,x,gid):
if x in data.keys() and data[x] != []:
for x_child in data[x]:
if x_child in data.keys():
op[x_child] = gid
recursive(op,x_child,gid)
else:
op[x] = gid
else:
op[x] = gid
for key in data.keys():
#print "Key: ", key
if key not in child_list:
gid = gid + 1
op[key] = gid
for x in data[key]:
op[x] = gid
recursive(op,x,gid)
related = pd.DataFrame({'items':op.keys(),
'uniq_group_id': op.values()})
mapped.sort_values('items')
Example below
Input:
source target
a b
b c
c c
c d
d d
e f
a d
h a
i f
Desired Output:
item uniq_group_id
a 1
b 1
c 1
d 1
h 1
e 2
f 2
i 2
My code gave me below output which is wrong.
item uniq_group_id
a 3
b 3
c 3
d 3
e 1
f 2
h 3
i 2
Another Example
Input:
df = pd.DataFrame({'source': ['a','b','c','c','d','e','a','h','i','a'],
'target':['b','c','c','d','d','f','d','a','f','a']})
Desired Output:
item uniq_group_id
a 1
b 1
c 1
d 1
e 2
f 2
My code Output:
item uniq_group_id
e 1
f 1
The order of the rows or the group id does not matter. The important thing here is to assign related items a same unique identifier. The whole problem is to find related group of items and assign them a unique group id.