"Duplicate like" specific letter removal

Question

 5645-01B                5645-01A           2002-01A             5325-01C
1812.999999       |      3265.00001   |    4723.000002     |     2190.999996
43.00000001       |          1        |      2.5           |          0
622               |         1783      |   2240.499994      |     1553.000002
1568.999996       |      850.0000002  |  757.9999998       |     948.9999999

This is a little part of my table I need to remove the last letter (A/B/C) so I can swap it on another dataframe. I used:

df1.columns = df1.columns.str.rstrip('A')
df1.columns = df1.columns.str.rstrip('B')
df1.columns = df1.columns.str.rstrip('C')

But the problem appeared to be the duplicates. As you can see above there are same numbers but different final letter (A,B or C). I need to get only the last version, it means if there's a column with C letter and there is a numeric duplicate with A or B, I have to remove the A/B column/columns completely, and the C column stays without the C. Ex. "5645-01B" must stay as 5645-01, while 5645-01A have to be deleted. The problem is that I can't just remove the letters as I did or removing all "A" because some "A" columns doesn't have a B or C and I must keep them. How do I check only for the "last versions" and keep them?

P.S the top row is the column names. Expected:

5645-01                       2002-01                  5325-01
1812.999999       |          4723.000002       |     2190.999996
43.00000001       |               2.5          |        0                    
622               |         2240.499994        |     1553.000002
1568.999996       |         757.9999998        |     948.9999999

The code that I continue with:

df1=df1.transpose()
df2 = pd.read_csv('table3.csv', index_col=['SAMPLE_ID'])
df1 = df1[df1.index.isin(df2.index)]
df1['The_ID'] = df2['EGF']
print(df1.head)

After that it print "Nans" instead of numeric values. ****SAMPLE_ID is an index which is similar to the top row above with the numbers but it doesn't include any letters so that is why I must remove them.

Posted down. Someone was trying to help me before posting: "s = df1.columns.str.extract('(.*)\D*$')[0] filtered = s.duplicated(keep='last') | (~s.duplicated(keep=False))". It did the job except that it turned all my numeric values to Nan. because my code is continued. — TheUndecided, Mar 03 '20 at 20:22

"Duplicate like" specific letter removal

0 Answers0