5

One of the columns of my dataframe looks something like this:

[application]
blah/3.14
xyz/5.2
abc
...
...

(representing software/version)

I'm trying to achieve something like this:

[application] [name]  [ver]
blah/3.14      blah    3.14
xyz/5.2        xyz     5.2 
abc            abc     na   <-- this missing value can be filled in with a string too
...  
...

As you can already tell, I'd like to split the column into two, using '/' as a delimiter. A stack overflow solution suggests something like this:

tmptbl = pd.DataFrame(main_tbl.application.str.split('/', 1).tolist(), columns= ['name', 'ver'])
main_tbl['name'] = tmptbl.name
main_tbl['ver'] = tmptbl.ver

Which looks great at first, but it crashes for columns without '/', such as 'abc'.

What else can I try?

SloppyPenguin
  • 65
  • 1
  • 4

1 Answers1

4

Use str.split with parameter expand=True for return DataFrame:

main_tbl[['name','ver']] = main_tbl.application.str.split('/', expand=True)
print (main_tbl)
  application  name   ver
0   blah/3.14  blah  3.14
1     xyz/5.2   xyz   5.2
2         abc   abc  None

And if need NaNs add replace:

main_tbl.ver = main_tbl.ver.replace({None:np.nan})
print (main_tbl)
  application  name   ver
0   blah/3.14  blah  3.14
1     xyz/5.2   xyz   5.2
2         abc   abc   NaN
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • 1
    Brilliant, thanks! I just had to add '1' in split to cover corner cases (e.g. some values starting with a slash) `.split('/', 1, expand=True)`. In case it helps others. – SloppyPenguin Mar 04 '17 at 21:36