I am trying to locate all the rows of a dataframe whose one attribute (say id_proof) value matches to the second part of another column (say adr_proof) that starts with a fixed word (say PARENT) and also the corresponding values should match which are part of the same dataframe.
For example, in the dataframe :
import pandas as pd
main = {'account_number' : [1,2,3,4,5,6,7,8,9,10,11,12],
'id_proof' : ['A','B','B','A','C','C','X','Y','X','Y','Y','X'],
'id_value' : [101,201,301,401,501,601,111,222,333,444,555,666],
'adr_proof' : ['Z','E','E','G','G','I','PARENT A','PARENT B','PARENT B','PARENT C','PARENT C','PARENT A'],
'adr_value' : [11,22,33,44,55,66,101,201,301,501,601,401]}
main = pd.DataFrame(main)
I am trying to achieve :
node1 node2 relation
1 7 parent-child
2 8 parent-child
3 9 parent-child
4 12 parent-child
5 10 parent-child
6 11 parent-child
Below is my code. I am aware that my code is incomplete. I am stuck with the split() function. I am new to python and pandas and am not sure how to invoke pandas' split() function rather than python's built-in str.split() function. I have gone through this question
import pandas as pd
main = {'account_number' : [1,2,3,4,5,6,7,8,9,10,11,12],
'id_proof' : ['A','B','B','A','C','C','X','Y','X','Y','Y','X'],
'id_value' : [101,201,301,401,501,601,111,222,333,444,555,666],
'adr_proof' : ['Z','E','E','G','G','I','PARENT A','PARENT B','PARENT B','PARENT C','PARENT C','PARENT A'],
'adr_value' : [11,22,33,44,55,66,101,201,301,501,601,401]}
main = pd.DataFrame(main)
df_group_count = pd.DataFrame({'count' : main.groupby(['adr_proof']).size()}).reset_index()
adr_type = df_group_count['adr_proof']
adr_type_parent = adr_type.loc[adr_type.str.startswith('PARENT',na=False)]
df_j_ = pd.DataFrame()
for j in adr_type_parent:
dfn_j = main.loc[(main['adr_proof'] == j)]
adr_type_parent_type = j.split(' ',expand=True,n=1)
res = main.loc[(main['id_proof'] == adr_type_parent_type[1]) & (main['id_value'] == dfn_j['adr_value'])]
res
Please provide a way to achieve my goal. The output is another dataframe. Please excuse for bad code or any violations. A completely different approach is also appreciated. Thank You.