How to do column string concatenation including space separator in Pandas dataframe?

Question

I am a Pandas DataFrame as follows:

df = pd.DataFrame({
    'id': [1,2 ,3],
    'txt1': ['Hello there1', 'Hello there2', 'Hello there3'],
    'txt2': ['Hello there4', 'Hello there5', 'Hello there6'],
    'txt3': ['Hello there7', 'Hello there8', 'Hello there9']
})
df

id  txt1            txt2            txt3
1   Hello   there1  Hello there4    Hello there7
2   Hello   there2  Hello there5    Hello there8
3   Hello   there3  Hello there6    Hello there9

I want to concatenate column txt1, txt2, and txt3. So far I am able to achieve it as follows:

df['alltext'] = df['txt1']  + df['txt2'] + df['txt3']
df

id  txt1            txt2            txt3            alltext
1   Hello there1    Hello there4    Hello there7    Hello there1Hello there4Hello there7
2   Hello there2    Hello there5    Hello there8    Hello there2Hello there5Hello there8
3   Hello there3    Hello there6    Hello there9    Hello there3Hello there6Hello there9

but how to introduce space character between the two column strings while concatenating in Pandas?

I have just started learning Pandas.

score 8 · Accepted Answer · edited Aug 08 '22 at 05:13

8

You can also add separator between columns:

df['alltext'] = df['txt1']  + ' ' + df['txt2'] + ' ' + df['txt3']

Or filter by DataFrame.filter only columns with txt in column name and use join per rows with apply:

df['alltext'] = df.filter(like='txt').apply(' '.join, axis=1)

Or filter only object columns by DataFrame.select_dtypes - most times a Series with a dtype of object is going to be a string - but it could be any Python object:

df['alltext'] = df.select_dtypes('object').apply(' '.join, axis=1)

Or select columns by positions - all columns without first by DataFrame.iloc:

df['alltext'] = df.iloc[:, 1:].apply(' '.join, axis=1)

Thank you, @Jon Clements for solution for better matching columns names with txt and numeric:

df['alltext'] = df.filter(regex=r'^txt\d+$').apply(' '.join, axis=1)

edited Aug 08 '22 at 05:13

Janosh

3,392
2
27
35

answered May 25 '19 at 07:09

jezrael

822,522
95
1,334
1,252

1

One might want to consider `.filter(regex=r'^txt\d+$')` just to be explicit about exactly what columns are wanted rather than the off-chance of `like='txt'` picking up something not wanted... (although - it'd be rather improbable that'd be case here) – Jon Clements May 25 '19 at 07:18
Slight nitpick... *obviously objects are strings* - Not quite true... they're objects that aren't of a numpy type... they could be (although unlikely in most cases) be *anything* rather than string - hence applying `str.join` to 'em will break. (Mind you - if you're storing silly stuff in a DF/array, then that's a whole different problem in itself :p) – Jon Clements May 25 '19 at 07:29

score 1 · Answer 2 · answered May 25 '19 at 07:10

1

Simply add space between that,

df['alltext'] = df['txt1']  + ' ' + df['txt2'] + ' ' + df['txt3']

answered May 25 '19 at 07:10

shaik moeed

5,300
1
18
54

How to do column string concatenation including space separator in Pandas dataframe?

2 Answers2