I find this comment:
There are some scenarios where every item in a series is a float but the dtype is object. For example, some error when reading from file that was coerced; or when you had mixed types (e.g. floats and strings) and substituted the strings with other floats at a later time; etc. Just use pd.to_numeric(df['score']) or .astype(float) directly.
df = pd.DataFrame({'value': [1,5.4,8,8.9]}).astype(object)
A = df['value'].iloc[0:10]
print (df['value'].dtype)
object
for ii in A:
print(type(ii))
<class 'float'>
<class 'float'>
<class 'float'>
<class 'float'>
So you can convert column to numeric by Series.astype
or to_numeric
:
df = pd.DataFrame({'value': [1,5.4,8,8.9]}).astype(object)
df['value'] = df['value'].astype(float)
#alternative
df['value'] = pd.to_numeric(df['value'])
A = df['value'].iloc[0:10]
print (df['value'].dtype)
float64
for ii in A:
print(type(ii))
<class 'float'>
<class 'float'>
<class 'float'>
<class 'float'>
Another reason are mixed numeric with strings and first 10 values are numeric:
df = pd.DataFrame({'value': [1.,5.4,8.,8.9,4.,5.,9.,8.,3.,2.4, 'text','sd',5.7]})
A = df['value'].iloc[0:10]
print (df['value'].dtype)
object
for ii in A:
print(type(ii))
<class 'float'>
<class 'float'>
<class 'float'>
<class 'float'>
<class 'float'>
<class 'float'>
<class 'float'>
<class 'float'>
<class 'float'>
<class 'float'>
You can find non numeric values:
print (df[pd.to_numeric(df['value'], errors='coerce').isna()])
value
10 text
11 sd
... and then convert them to missing values:
df['value'] = pd.to_numeric(df['value'], errors='coerce')
print (df)
value
0 1.0
1 5.4
2 8.0
3 8.9
4 4.0
5 5.0
6 9.0
7 8.0
8 3.0
9 2.4
10 NaN
11 NaN
12 5.7
A = df['value'].iloc[0:10]
print (df['value'].dtype)
float64
for ii in A:
print(type(ii))
<class 'float'>
<class 'float'>
<class 'float'>
<class 'float'>
<class 'float'>
<class 'float'>
<class 'float'>
<class 'float'>
<class 'float'>
<class 'float'>