3

I've got a pretty simple case that, for some reason, is giving me problems.

I'm combining multiple dataframes. As a result, I'll often have the same key, but different comments for each key value.

KeyValue       Comment
1235           This is okay
444            Problems here
1235           Investigate further

I'm trying to deduplicate the keys but preserve all of the comments by consolidating them into one Comments field. The output I'd like:

KeyValue       Comment
1235           This is okay | Investigate further
444            Problems here

I've tried:

newdf = olddf.groupby('KeyValue')['Comment'].apply(lambda x: ' | '.join(x)).reset_index()

But when I do that I get

"TypeError: sequence item 0: expected str instance, float found" 

I've seen similar questions to mine on here (that's where I got the original code) but not sure why I'm getting this error or how to resolve it. Any help would be appreciated.

  • 1
    Maybe try `olddf.astype(str).groupby('KeyValue')['Comment'].apply(' | '.join).reset_index()` ..? (note - you don't need the lambda syntax for `join`) – Chris Adams Mar 04 '20 at 20:20
  • 3
    Try `lambda x: ' | '.join(x.dropna())`. I think missing values are messing you up, since `NaN` is a float. Alternatively, you could do `olddf[olddf['Comment'].notnull()].groupby...` – ALollz Mar 04 '20 at 20:34
  • 1
    @ALollz that was the problem. Tripped up again by missing values :) Thanks! – TroublesomeQuarry Mar 04 '20 at 20:42

1 Answers1

0

I converted your keyvalue to string and it works:

import pandas as pd

mydata = pd.DataFrame([['KeyValue','Comment'],
[1235,'This is okay'],
[444,'Problems here'],
[1235,'Investigate further']])

mydata.columns = mydata.iloc[0]
mydata = mydata[1:]
print(mydata)

newdf = mydata.groupby(str('KeyValue'))['Comment'].apply(lambda x: ' | '.join(x)).reset_index()
print(newdf)  
0 KeyValue              Comment
1     1235         This is okay
2      444        Problems here
3     1235  Investigate further
   KeyValue                             Comment
0       444                       Problems here
1      1235  This is okay | Investigate further
tbrk
  • 156
  • 8