0

I don't understand well how the apply function works. Here's my code which works fine:

dftest = pd.DataFrame({'a': ['A BERTHOU'], 'b': ['BERTHOU']})

def test2(a, b):
      return a + b

dftest['concat'] = dftest.apply(lambda row: test2(row['a'], row['b']), axis=1)

But I want to do the same without using lambda function. I tried this:

dftest['concat'] = dftest.apply(test2(dftest['a'], dftest['b']), axis=1)

and this

dftest['concat'] = dftest.apply(test2(dftest['a'].str, dftest['b'].str), axis=1)

But none works.

Can you tell me how to use my function without using lambda function?

A precision : I want to use complex function so

df['sum'] = df.col1 + df.col2 or 

dftest['concat'] = dftest[['a', 'b']].sum(axis=1) 

wont' work.

I knew the solution

dftest['concat'] = dftest.apply(test2, axis=1)

def test2(row):
    return row.a + row.b

but I don't like it : it's impossible to undestand what is applied without looking to function (no parameter in the apply ligne) + the function is ugly : the function is not generic and tied to row.a and row.b

Conclusion : for the moment the best solution seems to be

dftest['concat'] = dftest.apply(lambda row: test2(row['a'], row['b']), axis=1)

and it seems imposssible to do it without the use of lambda on a complex function and using good coding pratices

JE_Muc
  • 5,403
  • 2
  • 26
  • 41
loic_midy
  • 103
  • 4

3 Answers3

3

I think you need to do this instead:

dftest['concat'] = dftest.apply(test2,axis=1)

what it does is function test2 is applied to each row.

def test2(row):
    return row.a + row.b
Sudhir Bastakoti
  • 99,167
  • 15
  • 158
  • 162
2

Try the following:

dftest['concat'] = dftest[['a', 'b']].sum(axis=1)

This will used the implemented vectorized colum/row oriented functions. I strongly recommend using these over any kind of fully written statements like apply or also dftest['a'] + dftest['b'], since only to fully optimized and vectorized backend of pandas will be used.

Furthermore try to avoid apply and lambda at all costs. apply will be orders of magnitudes slower than the implemented vectorized operations for larger DataFrames. Reasons why not to use lambda can be found in the PEP-8 styleguide (even though this is imho mainly based on personal preferences...)

For the case that you really want to use apply and lambda for some special reasons you can write your statement like this:

dftest['concat'] = dftest.apply(lambda row: row['a'] + row['b'], axis=1)

or this:

dftest['concat'] = dftest.apply(lambda row: row.sum(), axis=1)
JE_Muc
  • 5,403
  • 2
  • 26
  • 41
  • You are welcome. But I recommend to accept an answer and to open up a follow up question for the *more complex function* that you added to your question. Otherwise it is highly unlikely that you'll get answer on this more complex question. – JE_Muc Jan 25 '19 at 13:50
0

To apply function with pandas :

df['new_col'] = function(df['old_col'])

>>> import pandas as pd 
>>> d = {'col1': [1, 2], 'col2': [3, 4]}
>>> df = pd.DataFrame(data=d)
>>> print(df)
   col1  col2
0     1     3
1     2     4

>>> df['sum'] = df.col1 + df.col2
>>> print(df)
   col1  col2  sum
0     1     3    4
1     2     4    6
SimbaPK
  • 566
  • 1
  • 7
  • 26