1

I have this DataFrame, with row names unfortunately containing both integers and strings, for example '20210902_Rene_Nicole_OC2012A_Sample_0087_20uLinject_01'. I would like to rename the rows to contain just the sample number, the example above would just become '0087'.

df=pd.DataFrame([['20210902_Rene_Nicole_OC2012A_Sample_0087_20uLinject_01',2.233,7.84554,10.222],['20210902_Rene_Nicole_OC2012A_Sample_0097_20uLinject_01',2.000,7.8666,10.000],['20210902_Rene_Nicole_OC2012A_Sample_0098_20uLinject_01',2.230,7.8500,10.005]],columns=('786.2323','800.3231','832.5699','850.0022'))

This creates the following DataFrame:
df

The desired output looks like this DataFrame: Desired resulting dataframe

I have tried transposing the dataframe with the intention of renaming columns then transposing back at the end, and using the str.replace function to edit column names, but I think there are problems because the column name contains both integers and strings, and they change. If this had worked, I would do it again to delete the sample name "_20uLinject_01", though I am wondering whether there is a better way to do this.

df.transpose()
df.columns = df.columns.str.replace('20210902_Rene_Nicole_OC2012A_Sample_', '')
#returns error: "AttributeError: Can only use .str accessor with string values!"

Any ideas how to go about this? I have around 180 rows to rename, so it would be great to loop through or not have to do it manually for each row. Thank you!

sotmot
  • 1,256
  • 2
  • 9
  • 21

1 Answers1

0

Just use .str.split("_").str[<index>]. The code is as follows:

>>> df["786.2323"] = df["786.2323"].str.split("_").str[5]
>>> df
  786.2323  800.3231  832.5699  850.0022
0     0087     2.233   7.84554    10.222
1     0097     2.000   7.86660    10.000
2     0098     2.230   7.85000    10.005

Here the index 5 corresponds to the value you require.

sotmot
  • 1,256
  • 2
  • 9
  • 21