3

results_table is a pd.DataFrame

When I

print(type(results_table.loc[0,'Mean recall score']))

it return

<class 'numpy.float64'>

Every items is float

But when I

print(results_table['Mean recall score'].dtype)

it returns

object

Why is there such behavior?

jpp
  • 159,742
  • 34
  • 281
  • 339
SIRIUS
  • 33
  • 3
  • 2
    There are some scenarios where every item in a series is a float but the `dtype` is `object`. For example, some error when reading from file that was coerced; or when you had mixed types (e.g. floats and strings) and substituted the strings with other floats at a later time; etc. Just use `pd.to_numeric(df['score'])` or `.astype(float)` directly – rafaelc Nov 16 '18 at 00:41

2 Answers2

2

First note df.loc[0, x] only considers the value in row label 0 and column label x, not your entire dataframe. Now let's consider an example:

df = pd.DataFrame({'A': [1.5, 'hello', 'test', 2]}, dtype=object)

print(type(df.loc[0, 'A']))  # type of single element in series

# <class 'float'>

print(df['A'].dtype)         # type of series

# object

As you can see, an object dtype series can hold arbitrary Python objects. You can even, if you wish, extract the type of each element of your series:

print(df['A'].map(type))

# 0    <class 'float'>
# 1      <class 'str'>
# 2      <class 'str'>
# 3      <class 'int'>
# Name: A, dtype: object

An object dtype series is simply a collection of pointers to various objects not held in a contiguous memory block, as may be the case with numeric series. This is comparable to Python list and explains why performance is poor when you work with object instead of numeric series.

See also this answer for a visual respresentation of the above.

jpp
  • 159,742
  • 34
  • 281
  • 339
0

In the first print statement you are slicing out one single element from you dataframe. This single item you are looking at is a float.

In the second print statement you are actually pulling out a pandas series (ie you are pulling out the whole column) and printing the type of that.

The pandas series is an object, but each entry in the series is a float. So this is why you get the results you did.

James Fulton
  • 322
  • 2
  • 8