1

Problem description

When provided a func that returns a list of numerical values of different dtypes, DataFrame's apply up-converts all the returned values to a common type. For example, in the code below the elements in the 2nd column, the integers "3", are converted by the apply() to the complex number (3.0+0.0j).

df = pd.DataFrame([1,2,3])
df.apply(lambda row: [ 1+5j, 3], axis='columns', result_type='expand')

          0         1
0  1.0+5.0j  3.0+0.0j
1  1.0+5.0j  3.0+0.0j
2  1.0+5.0j  3.0+0.0j

This behavior is inherited from Numpy's type determination:

If not given, then the type will be determined as the minimum type required to hold the objects in the sequence.

Is there any way to provide a dtype parameter to the DataFrame's apply ?

Expected Output
          0  1
0  1.0+5.0j  3
1  1.0+5.0j  3
2  1.0+5.0j  3
Berthier Lemieux
  • 3,785
  • 1
  • 25
  • 25
  • looks like it's the `expand` option that's doing this. Without it, the result is 1 column with list elements. – hpaulj Mar 08 '21 at 12:09

1 Answers1

0

While it's possible to specify mixed dtypes within a numpy array, it seems the items have to be defined as a tuple:

np.array((1+5j, 3), dtype='|complex, int')

So potential solutions include:

  1. using .astype({1: 'int'}).

  2. splitting real/imaginary and then recombining them as needed:

df = df.apply(lambda row: [i for x in [ 1+5j, 3] for i in [x.real, x.imag]], axis='columns', result_type='expand')
df = df[df.columns[df.sum(axis=0)!=0]]
SultanOrazbayev
  • 14,900
  • 3
  • 16
  • 46