4

I am a Pandas DataFrame as follows:

df = pd.DataFrame({
    'id': [1,2 ,3],
    'txt1': ['Hello there1', 'Hello there2', 'Hello there3'],
    'txt2': ['Hello there4', 'Hello there5', 'Hello there6'],
    'txt3': ['Hello there7', 'Hello there8', 'Hello there9']
})
df

id  txt1            txt2            txt3
1   Hello   there1  Hello there4    Hello there7
2   Hello   there2  Hello there5    Hello there8
3   Hello   there3  Hello there6    Hello there9

I want to concatenate column txt1, txt2, and txt3. So far I am able to achieve it as follows:

df['alltext'] = df['txt1']  + df['txt2'] + df['txt3']
df

id  txt1            txt2            txt3            alltext
1   Hello there1    Hello there4    Hello there7    Hello there1Hello there4Hello there7
2   Hello there2    Hello there5    Hello there8    Hello there2Hello there5Hello there8
3   Hello there3    Hello there6    Hello there9    Hello there3Hello there6Hello there9

but how to introduce space character between the two column strings while concatenating in Pandas?

I have just started learning Pandas.

halfer
  • 19,824
  • 17
  • 99
  • 186
user3243499
  • 2,953
  • 6
  • 33
  • 75

2 Answers2

8

You can also add separator between columns:

df['alltext'] = df['txt1']  + ' ' + df['txt2'] + ' ' + df['txt3']

Or filter by DataFrame.filter only columns with txt in column name and use join per rows with apply:

df['alltext'] = df.filter(like='txt').apply(' '.join, axis=1)

Or filter only object columns by DataFrame.select_dtypes - most times a Series with a dtype of object is going to be a string - but it could be any Python object:

df['alltext'] = df.select_dtypes('object').apply(' '.join, axis=1)

Or select columns by positions - all columns without first by DataFrame.iloc:

df['alltext'] = df.iloc[:, 1:].apply(' '.join, axis=1)

Thank you, @Jon Clements for solution for better matching columns names with txt and numeric:

df['alltext'] = df.filter(regex=r'^txt\d+$').apply(' '.join, axis=1)
Janosh
  • 3,392
  • 2
  • 27
  • 35
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • 1
    One might want to consider `.filter(regex=r'^txt\d+$')` just to be explicit about exactly what columns are wanted rather than the off-chance of `like='txt'` picking up something not wanted... (although - it'd be rather improbable that'd be case here) – Jon Clements May 25 '19 at 07:18
  • Slight nitpick... *obviously objects are strings* - Not quite true... they're objects that aren't of a numpy type... they could be (although unlikely in most cases) be *anything* rather than string - hence applying `str.join` to 'em will break. (Mind you - if you're storing silly stuff in a DF/array, then that's a whole different problem in itself :p) – Jon Clements May 25 '19 at 07:29
1

Simply add space between that,

df['alltext'] = df['txt1']  + ' ' + df['txt2'] + ' ' + df['txt3']
shaik moeed
  • 5,300
  • 1
  • 18
  • 54