I am a beginner in coding and would highly appreciate your help. I have a file of 4gb and i am trying to select the the most repeated field in column B (that is not similar to column A) and the corresponding column C
For example,
Column A Column B Column C id
Sam Sam 12 001
Alex David 10 001
David David 15 002
Sarah Alice 23 001
Alice Sam 18 002
Sam Alice 20 002
Anna Sam 26 003
I would like to exclude if names in column A and column B are same and then find the most repeated names in column B. And also I would like to find the corresponding id of the most repeated fields in column B.
When i tried using the following command, i get memory error.
(df.loc[~(df['Column B'].isin(df['Column A']) & df['Column B'].isin(df['Column C'])), 'Column B'])