Providing dtypes for Dataframe.apply()

Question

Problem description

When provided a func that returns a list of numerical values of different dtypes, DataFrame's apply up-converts all the returned values to a common type. For example, in the code below the elements in the 2nd column, the integers "3", are converted by the apply() to the complex number (3.0+0.0j).

df = pd.DataFrame([1,2,3])
df.apply(lambda row: [ 1+5j, 3], axis='columns', result_type='expand')

          0         1
0  1.0+5.0j  3.0+0.0j
1  1.0+5.0j  3.0+0.0j
2  1.0+5.0j  3.0+0.0j

This behavior is inherited from Numpy's type determination:

If not given, then the type will be determined as the minimum type required to hold the objects in the sequence.

Is there any way to provide a dtype parameter to the DataFrame's apply ?

Expected Output

          0  1
0  1.0+5.0j  3
1  1.0+5.0j  3
2  1.0+5.0j  3

looks like it's the `expand` option that's doing this. Without it, the result is 1 column with list elements. — hpaulj, Mar 08 '21 at 12:09

SultanOrazbayev · Answer 1 · 2021-03-08T07:15:19.480

While it's possible to specify mixed dtypes within a numpy array, it seems the items have to be defined as a tuple:

np.array((1+5j, 3), dtype='|complex, int')

So potential solutions include:

using .astype({1: 'int'}).
splitting real/imaginary and then recombining them as needed:

df = df.apply(lambda row: [i for x in [ 1+5j, 3] for i in [x.real, x.imag]], axis='columns', result_type='expand')
df = df[df.columns[df.sum(axis=0)!=0]]

Providing dtypes for Dataframe.apply()

Problem description

1 Answers1