18

In python pandas, there is a Series/dataframe column of str values to combine into one long string:

df = pd.DataFrame({'text' : pd.Series(['Hello', 'world', '!'], index=['a', 'b', 'c'])})

Goal: 'Hello world !'

Thus far methods such as df['text'].apply(lambda x: ' '.join(x)) are only returning the Series.

What is the best way to get to the goal concatenated string?

EdChum
  • 376,765
  • 198
  • 813
  • 562
cycle_about
  • 325
  • 1
  • 3
  • 6

3 Answers3

32

You can join a string on the series directly:

In [3]:
' '.join(df['text'])

Out[3]:
'Hello world !'
EdChum
  • 376,765
  • 198
  • 813
  • 562
  • 1
    I am getting an error while doing this: "TypeError: sequence item 0: expected str instance, list found". This is in python3, could you please guide? – pnv Jul 27 '17 at 08:52
  • 1
    @user1930402 asking questions in comments is poor form on SO, the error message is clear you have lists in your dataframe not strings hence the error. As I don't have access to your computer I can only speculate that for some reason you're storing lists in your df which is not advisable. I can't help you, you need to post a new question, you should also ask yourself if you really need to store lists at all, it defeats the point of using pandas when you store non scalar values – EdChum Jul 27 '17 at 08:55
13

Apart from join, you could also use pandas string method .str.cat

In [171]: df.text.str.cat(sep=' ')
Out[171]: 'Hello world !'

However, join() is much faster.

Zero
  • 74,117
  • 18
  • 147
  • 154
3

Your code is "returning the series" because you didn't specify the right axis. Try this:

df.apply(' '.join, axis=0)
text    Hello world !
dtype: object

Specifying the axis=0 combines all the values from each column and puts them in a single string. The return type is a series where the index labels are the column names, and the values are the corresponding joined string. This is particularly useful if you want to combine more than one column into a single string at a time.

Generally I find that it is confusing to understand which axis you need when using apply, so if it doesn't work the way you think it should, always try applying along the other axis too.

Alex
  • 2,154
  • 3
  • 26
  • 49
  • helpful description +10, but note that you're using `df.apply` whereas OP used `df['text'].apply` ([Series.apply](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.apply.html) has no `axis`) – tdy Jun 27 '21 at 05:09
  • @tdy that's true. This is because Series.apply generally works on single values at a time, more like DataFrame.applymap. From the Series.apply docs: "Invoke function on values of Series. Can be ufunc (a NumPy function that applies to the entire Series) or a Python function that only works on single values." – Alex Jun 27 '21 at 22:58