Why do I get different results for pandas Series.apply and DataFrame.applymap?

Question

I'd like to check if all values have the same types as in the first row. Somehow df.applymap and series.apply don't behave like I would have assumed.

The dataset is from the imdb sentiment analysis on kaggle.

print(df.head())

         id  sentiment                                             review
0  "5814_8"          1  "With all this stuff going down at the moment ...
1  "2381_9"          1  "\"The Classic War of the Worlds\" by Timothy ...
2  "7759_3"          0  "The film starts with a manager (Nicholas Bell...
3  "3630_4"          0  "It must be assumed that those who praised thi...
4  "9495_8"          1  "Superbly trashy and wondrously unpretentious ...

Each row seems to be str,int,str. So everything seems to be fine.

print(df.applymap(type))

              id      sentiment         review
0  <class 'str'>  <class 'int'>  <class 'str'>
1  <class 'str'>  <class 'int'>  <class 'str'>
2  <class 'str'>  <class 'int'>  <class 'str'>
3  <class 'str'>  <class 'int'>  <class 'str'>
4  <class 'str'>  <class 'int'>  <class 'str'>

Calling apply on the series looks a little bit different. The sentiment is int64 instead of int.

print(df.iloc[0].apply(type))

id                   <class 'str'>
sentiment    <class 'numpy.int64'>
review               <class 'str'>
Name: 0, dtype: object

Maybe its the same anyways so I compared the types.

print(df.applymap(type) == df.iloc[0].apply(type))

    id  sentiment   review
0   True    False   True
1   True    False   True
2   True    False   True
3   True    False   True
4   True    False   True

The result is unexpected. At least the first line should be True,True,True. I use applymap on a DataFrame which should be element wise. The second apply is on a series, which should also be element wise. So why are the results not equal?

Related: [What is the difference between native int type and the numpy.int types?](https://stackoverflow.com/questions/38155039/what-is-the-difference-between-native-int-type-and-the-numpy-int-types) — jpp, Oct 03 '18 at 11:11

score 0 · Accepted Answer · answered Oct 03 '18 at 18:14

It took me a while to understand jpp's comment. But I think I'm able to answer my own question now.

df.iloc[0] returns a pandas series which is a numpy array. So all the types in there are numpy types too. The number is converted to a numpy.int64

The values in a DataFrame seem to be native python types. Which is obviously not equal to a numpy int.

My originally attempted comparison should be like this instead:

df.applymap(type) == df.head(1).applymap(type).iloc[0]

Why do I get different results for pandas Series.apply and DataFrame.applymap?

1 Answers1