When importing data from a financial system, the columns have this unnecessary common prefix. This prefix changes from table to table.
Question: How can I automatically find the common prefix and remove it from my column names, keeping in mind that this prefix changes?
Example:
import pandas as pd
import numpy as np
cols = ["abc_W1", "abc_X2", "abc_Y3", "abc_Z4"]
data = np.random.randint(low=1, high=10, size=(4,4))
df = pd.DataFrame(data=data, columns=cols)
abc_W1 abc_X2 abc_Y3 abc_Z4
0 2 2 3 5
1 8 4 2 7
2 1 5 1 4
3 8 7 4 1
In this case the common prefix is _abc
Also, what would be the solution when we have both common prefixes and suffixes, like below:
abc_W1_foo abc_X2_foo abc_Y3_foo abc_Z4_foo
0 1 5 3 2
1 2 1 7 1
2 2 4 7 1
3 7 9 9 3
The expected output would be:
W1 X2 Y3 Z4
0 2 2 3 5
1 8 4 2 7
2 1 5 1 4
3 8 7 4 1