I want to sort a dataframe (3106 rows x 24 columns) based on a master list of strings in the first column ('MASTER') so that the strings in each row of the dataframe are aligned if there is a match and if there is no match then print 0 for that cell. The master list contains every possible string in any column but not every string will appear in each column.
This SO post is pretty close to what I'm trying to do:
Align DataFrames in pandas so that data matches in rows
I tried using this approach and realized it just looks to see if each string is a match based on the order in the master column but doesn't search the whole column for a match or do any reordering.
new_data = data[data.apply(lambda x: x['MASTER'] == x, axis=1)]
Here is what I'm trying to accomplish with my data.
BEFORE
MASTER CENTRAL_NERVOUS NERVOUS_SYSTEM NEUROGENESIS
12AEX 9630013A20RIK 12AEX 2610301B20RIK
2610042L04RIK AARS 2610042L04RIK 9630013A20RIK
2610301B20RIK AATK 2610301B20RIK A830010M20RIK
2700046G09RIK ABCA2 2700046G09RIK AU040320
31BEX ABCB1B 31BEX AATK
38DLP ABCB6 7A6EX ABCA2
7A6EX ABL1 9630013A20RIK ABCC8
9630013A20RIK ABL2 97C2 ABI1
97C2 ABR A830010M20RIK ABI2
... ... ... ...
AFTER
MASTER CENTRAL_NERVOUS NERVOUS_SYSTEM NEUROGENESIS
12AEX 0 12AEX 0
2610042L04RIK 0 2610042L04RIK 0
2610301B20RIK 0 2610301B20RIK 2610301B20RIK
2700046G09RIK 0 2700046G09RIK 0
31BEX 0 31BEX 0
38DLP 0 0 0
7A6EX 0 7A6EX 0
9630013A20RIK 9630013A20RIK 9630013A20RIK 9630013A20RIK
97C2 0 97C2 0
I've been trying to do this with pandas/python but would be open to solutions w/ awk/bash or R. Any hints/future direction appreciated since I'm stuck at this point.