2

Lets say I had the following dataframe

import pandas as pd

data = [['Mallika', 23, 'Student'], ['Yash', 25, 'Tutor'], ['Abc', 14, 'Clerk']]

data_frame = pd.DataFrame(data, columns=['Student.first.name.word', 'Student.Current.Age.word', 'Student.Current.Profession.word'])

  Student.first.name.word  Student.Current.Age.word Student.Current.Profession.word
0           Mallika                23                 Student
1              Yash                25                   Tutor
2               Abc                14                   Clerk

How would I sub out the common column header words "Student" and "word"

so that you would get the following dataframe:

      first.name  Current.Age Current.Profession
0  Mallika   23    Student
1     Yash   25      Tutor
2      Abc   14      Clerk

3 Answers3

3

You can remove those words and .s from the columns with a regex and assign it back:

data_frame.columns = data_frame.columns.str.replace(r"(Student|word|\.)", "")

to get

>>> data_frame

      name  Age Profession
0  Mallika   23    Student
1     Yash   25      Tutor
2      Abc   14      Clerk

after update

You can split - slice - join:

data_frame.columns = data_frame.columns.str.split(r"\.").str[1:-1].str.join(".")

i.e. split over literal dot, take out first & last elements and lastly join them with a dot

to get

  first.name  Current.Age Current.Profession
0    Mallika           23            Student
1       Yash           25              Tutor
2        Abc           14              Clerk
Mustafa Aydın
  • 17,645
  • 4
  • 15
  • 38
2

Here's is an extension of my answer to remove common prefixes. The benefit of this method is that it finds the prefixes and suffixes in a general way, so no need to hardcode any patterns.

cols = data_frame.columns

common_prefix = os.path.commonprefix(cols.tolist())
common_suffix = os.path.commonprefix([col[::-1] for col in cols])[::-1]

data_frame.columns = cols.str.replace(f"{common_prefix}|{common_suffix}", "", regex=True)
      name  Age Profession
0  Mallika   23    Student
1     Yash   25      Tutor
2      Abc   14      Clerk

Update, same solution works in a general way for the updated question:

  first.name  Current.Age Current.Profession
0    Mallika           23            Student
1       Yash           25              Tutor
2        Abc           14              Clerk
Erfan
  • 40,971
  • 8
  • 66
  • 78
1

to remove all words and not just hard coded ones you can try

df = data_frame
from functools import reduce
common_words = [i.split(".") for i in df.columns.tolist()]
common_words =reduce(lambda x,y : set(x).intersection(y) ,common_words)
pat = r'\b(?:{})\b'.format('|'.join(common_words))

df.columns = df.columns.str.replace(pat, "").str[1:-1]

Output:

print(df)


    first.name  Current.Age Current.Profession
0   Mallika     23          Student
1   Yash        25          Tutor
2   Abc         14          Clerk
Yefet
  • 2,010
  • 1
  • 10
  • 19