0

I have a dataframe which has no NAN (or any sort of missing values) and all the values in a column are numerics. When I check data type of each row for that column I get float but the data type of the overall column is object.

I have looked at other similar problems but did not get any clear answer.

Problem Steps: (Looking at the first 10 rows)

A = df['value'].iloc[0:10]
for ii in A:
  print(type(ii))

float
float
float
float
float
float
float
float
float
float

However

print(A.dtype)
object

Can someone explain why and when this happens.

user59419
  • 893
  • 8
  • 20
  • Can you try `A = df['value'].iloc[0:10].astype(float)` ? – jezrael May 19 '23 at 05:58
  • How was this dataframe created? You've got python `float` objects and not the more efficient numpy floats. So how it was built or transformed is where you should look. You could fix it with `df['value'] = df['value'].astype(np.float64)`. – tdelaney May 19 '23 at 06:07

2 Answers2

0

It's difficult to tell you why without details on how you created the DataFrame, but this is not impossible.

df = pd.DataFrame({'value': [1.1,2.2,3.3]},
                  dtype=object)

df['value'].dtype
# dtype('O')

This might mean that you performed a non vectorial operation or that you sliced a DataFrame that initially contained other objects:

df = pd.DataFrame({'value': ['X',1.1,2.2,3.3]}).iloc[1:]

df['value'].dtype
# dtype('O')

You can easily convert with astype

df['value'] = df['value'].astype(float)

Or pandas.to_numeric that is a bit more powerful:

df['value'] = pd.to_numeric(df['value'])

If you were to have non-convertible objects you can avoid raising an error by forcing conversion with:

df['value'] = pd.to_numeric(df['value'], errors='coerce')
mozway
  • 194,879
  • 13
  • 39
  • 75
0

I find this comment:

There are some scenarios where every item in a series is a float but the dtype is object. For example, some error when reading from file that was coerced; or when you had mixed types (e.g. floats and strings) and substituted the strings with other floats at a later time; etc. Just use pd.to_numeric(df['score']) or .astype(float) directly.

df = pd.DataFrame({'value': [1,5.4,8,8.9]}).astype(object)

A = df['value'].iloc[0:10]

print (df['value'].dtype)
object

for ii in A:
  print(type(ii))
<class 'float'>
<class 'float'>
<class 'float'>
<class 'float'>

So you can convert column to numeric by Series.astype or to_numeric:

df = pd.DataFrame({'value': [1,5.4,8,8.9]}).astype(object)

df['value'] = df['value'].astype(float)

#alternative
df['value'] = pd.to_numeric(df['value'])

A = df['value'].iloc[0:10]

print (df['value'].dtype)
float64

for ii in A:
  print(type(ii))
<class 'float'>
<class 'float'>
<class 'float'>
<class 'float'>

Another reason are mixed numeric with strings and first 10 values are numeric:

df = pd.DataFrame({'value': [1.,5.4,8.,8.9,4.,5.,9.,8.,3.,2.4, 'text','sd',5.7]})


A = df['value'].iloc[0:10]

print (df['value'].dtype)
object

for ii in A:
  print(type(ii))
<class 'float'>
<class 'float'>
<class 'float'>
<class 'float'>
<class 'float'>
<class 'float'>
<class 'float'>
<class 'float'>
<class 'float'>
<class 'float'>

You can find non numeric values:

print (df[pd.to_numeric(df['value'], errors='coerce').isna()])
   value
10  text
11    sd

... and then convert them to missing values:

df['value'] = pd.to_numeric(df['value'], errors='coerce')
print (df)
    value
0     1.0
1     5.4
2     8.0
3     8.9
4     4.0
5     5.0
6     9.0
7     8.0
8     3.0
9     2.4
10    NaN
11    NaN
12    5.7

A = df['value'].iloc[0:10]

print (df['value'].dtype)
float64

for ii in A:
  print(type(ii))
<class 'float'>
<class 'float'>
<class 'float'>
<class 'float'>
<class 'float'>
<class 'float'>
<class 'float'>
<class 'float'>
<class 'float'>
<class 'float'>
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • 1
    **Comments have been [moved to chat](https://chat.stackoverflow.com/rooms/253735/discussion-on-answer-by-jezrael-all-the-values-of-column-are-float-but-the-colum); please do not continue the discussion here.** Before posting a comment below this one, please review the [purposes of comments](/help/privileges/comment). Comments that do not request clarification or suggest improvements usually belong as an [answer](/help/how-to-answer), on [meta], or in [chat]. Comments continuing discussion may be removed. – deceze May 19 '23 at 08:02