1

How can I split a dataframe column into two parts such that the value in dataframe column is later replaced by the splitted value. For example, I have a dataframe like :

col1       col2
"abc"      "A, BC"
"def"      "AX, Z"
"pqr"      "P, R"
"xyz"      "X, YZ"

I want to extract values before , and replace that cell with the extracted value. So, the output should look like :

col1   col2
abc    A
def    AX
pqr    P
xyz    X

I am trying to do it as :

df['col2'].apply(lambda x: x.split(',')[0])

But it gives me error. Please suggest how can I get the desired output.

FabienP
  • 3,018
  • 1
  • 20
  • 25
Alex
  • 81
  • 4
  • 10

1 Answers1

7

In this case you can you the str methods of pandas, that will use vectorized functions. It will also be faster that apply.

df.col2 = df.col2.str.split(', ').str[0]

>>> df
Out[]:
  col1 col2
0  abc    A
1  def   AX
2  pqr    P
3  xyz    X

To use this on Series containing string, you should call the str attribute before any function. See the doc for more details.

In the above solution, note the .str.split(', ') that replace split. And .str[0] that allow to slice the result of the split, whereas just using .str.split(', ')[0] would get index 0 of the Series.

FabienP
  • 3,018
  • 1
  • 20
  • 25
  • Thank you. Could you please help me a little bit more. What i need to do is : I have a dictionary with the state name mapping like states_names_mapping = { 'AK': 'Alaska', 'AL': 'Alabama', 'AR': 'Arkansas', 'AS': 'American Samoa', 'AZ': 'Arizona', 'CA': 'California', } And a dataframe column Place like Place California, USA Beverly Hills, CA California CA NY, USA USA – Alex Oct 14 '17 at 22:14
  • Thank you. Could you please help me a little bit more. What i need to do is : I have a dict for state name mapping like state_map={ 'AK': 'Alaska', 'AL': 'Alabama', 'AR': 'Arkansas', 'AS': 'American Samoa', } And a dataframe column Place like California, USA Beverly Hills, CA California CA NY, USA USA I need to replace this column values with the state names. – Alex Oct 14 '17 at 22:30
  • if(df['Place'])!='' if df['Place'].str.split(', ').str[0] in state_map.values(): df['New'] = df['Place'].str.split(', ',).str[0] elif df['Place'].str.split(', ').str[1] in state_map.values(): df['New'] = df['Place'].str.split(', ',).str[1] elif df['Place'].str.split(', ').str[0] in state_map.keys(): for i in state_map: df['New'] = state_map[i] break elif df['Place'].str.split(', ').str[1] in state_map.keys(): for i in state_map: df['New'] = states_names_mapping[i] break – Alex Oct 14 '17 at 22:31
  • But it gives me error "The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()." – Alex Oct 14 '17 at 22:32
  • I'll have a look, but you should add this to the body of your question to preserve to code format. – FabienP Oct 15 '17 at 07:00
  • Thank you, It will be great help if you could help me to sort this asap. If you had any problem in understanding, I can repost it as a new question or If i can send it to you some other way. – Alex Oct 15 '17 at 18:00
  • https://github.com/Rachel-Z/Problems/issues/1 I posted it here also in a proper format. – Alex Oct 15 '17 at 18:03
  • @Ritika, found your new post. You can close this question by approving the answer that you judge solved the problem described above. – FabienP Oct 15 '17 at 19:42
  • Thank you Fabien. Really appreciate your help !! – Alex Oct 15 '17 at 20:00