1

I'd like to check if all values have the same types as in the first row. Somehow df.applymap and series.apply don't behave like I would have assumed.

The dataset is from the imdb sentiment analysis on kaggle.

print(df.head())

         id  sentiment                                             review
0  "5814_8"          1  "With all this stuff going down at the moment ...
1  "2381_9"          1  "\"The Classic War of the Worlds\" by Timothy ...
2  "7759_3"          0  "The film starts with a manager (Nicholas Bell...
3  "3630_4"          0  "It must be assumed that those who praised thi...
4  "9495_8"          1  "Superbly trashy and wondrously unpretentious ...

Each row seems to be str,int,str. So everything seems to be fine.

print(df.applymap(type))

              id      sentiment         review
0  <class 'str'>  <class 'int'>  <class 'str'>
1  <class 'str'>  <class 'int'>  <class 'str'>
2  <class 'str'>  <class 'int'>  <class 'str'>
3  <class 'str'>  <class 'int'>  <class 'str'>
4  <class 'str'>  <class 'int'>  <class 'str'>

Calling apply on the series looks a little bit different. The sentiment is int64 instead of int.

print(df.iloc[0].apply(type))

id                   <class 'str'>
sentiment    <class 'numpy.int64'>
review               <class 'str'>
Name: 0, dtype: object

Maybe its the same anyways so I compared the types.

print(df.applymap(type) == df.iloc[0].apply(type))

    id  sentiment   review
0   True    False   True
1   True    False   True
2   True    False   True
3   True    False   True
4   True    False   True

The result is unexpected. At least the first line should be True,True,True. I use applymap on a DataFrame which should be element wise. The second apply is on a series, which should also be element wise. So why are the results not equal?

Yelve Yakut
  • 187
  • 2
  • 11
  • 1
    Related: [What is the difference between native int type and the numpy.int types?](https://stackoverflow.com/questions/38155039/what-is-the-difference-between-native-int-type-and-the-numpy-int-types) – jpp Oct 03 '18 at 11:11

1 Answers1

0

It took me a while to understand jpp's comment. But I think I'm able to answer my own question now.

df.iloc[0] returns a pandas series which is a numpy array. So all the types in there are numpy types too. The number is converted to a numpy.int64

The values in a DataFrame seem to be native python types. Which is obviously not equal to a numpy int.

My originally attempted comparison should be like this instead:

df.applymap(type) == df.head(1).applymap(type).iloc[0]
Yelve Yakut
  • 187
  • 2
  • 11