Pandas read sql integer became float

Question

I met a problem that when I use pandas to read Mysql table, some columns (see 'to_nlc') used to be integer became a float number (automatically add .0 after that). Can anyone figure it out? Or some guessings? Thanks very much!

score 21 · Accepted Answer · answered Jun 13 '16 at 18:43

21

Problem is your data contains NaN values, so int is automatically cast to float.

I think you can check NA type promotions:

When introducing NAs into an existing Series or DataFrame via reindex or some other means, boolean and integer types will be promoted to a different dtype in order to store the NAs. These are summarized by this table:

Typeclass   Promotion dtype for storing NAs
floating    no change
object      no change
integer     cast to float64
boolean     cast to object

While this may seem like a heavy trade-off, in practice I have found very few cases where this is an issue in practice. Some explanation for the motivation here in the next section.

answered Jun 13 '16 at 18:43

jezrael

822,522
95
1,334
1,252

4

how can this be avoided? The integer are used as identifiers so that casting to float causes precision errors – Hanan Shteingart Dec 21 '16 at 10:50
2

@HananShteingart - is possible replace `NaN` to `0` ? like `df['colname'] = df['colname'].fillna().astype(int)` or `df = df.fillna(0).astype(int)` ? – jezrael Dec 21 '16 at 11:20
I already get it as float from the SQL. What I did is to cast the column to a string in the sql query: e.g. CAST(bigint_column AS VARCHAR) so I get it as a string in the dataframe. I didn't mind that it's not numeric anymore because the column represents an ID anyway. – Hanan Shteingart Dec 25 '16 at 14:55
1

Is there an update for this answer? I've seen https://pandas.pydata.org/pandas-docs/stable/user_guide/integer_na.html – Martin Thoma Dec 09 '20 at 09:18
@MartinThoma - Honestly no idea, if use `integer na` and raise error then not. – jezrael Dec 09 '20 at 09:21

Nikolay · Answer 2 · 2020-04-13T18:50:24.710

As already said the problem is that pandas' integer can not handle NULL/NA value.

You can replace read_sql_table with read_sql and convert NULL to some integer value (for example 0 or -1, something which has NULL sense in your setting):

df = pandas.read_sql("SELECT col1, col2, IFNULL(col3, 0) FROM table", engine)

Here col3 can be NULL in mysql, ifnull will return 0 if it is NULL or col3 value otherwise.

Or same thing with little function helper:

def read_sql_table_with_nullcast(table_name, engine, null_cast={}):
    """
    table_name - table name
    engine - sql engine
    null_cast - dictionary of columns to replace NULL:
           column name as key value to replace with as value.
           for example {'col3':0} will set all NULL in col3 to 0
    """
    import pandas
    cols = pandas.read_sql("SHOW COLUMNS FROM " + table_name, engine)
    cols_call = [c if c not in null_cast else "ifnull(%s,%d) as %s"%(c,null_cast[c],c) for c in cols['Field']]
    sel = ",".join(cols_call)
    return pandas.read_sql("SELECT " + sel + " FROM " + table_name, engine)

read_sql_table_with_nullcast("table", engine, {'col3':0})

score 0 · Answer 3 · answered May 17 '22 at 13:18

You can use parameters: coerce_float=False

df = pd.read_sql(sql, con=conn, coerce_float=False)

coerce_floatbool, default True Attempts to convert values of non-string, non-numeric objects (like decimal.Decimal) to floating point, useful for SQL result sets.

https://pandas.pydata.org/docs/reference/api/pandas.read_sql.html

score 0 · Answer 4 · answered Feb 17 '23 at 20:02

0

Another possibility is to exclude NULL values in the WHERE clause of your SQL query, if you're not expecting them and they correspond to unusable rows.

So it won't be suitable in all circumstances, but is a clean and simple option when it does apply.

answered Feb 17 '23 at 20:02

Matthew Strawbridge

19,940
10
72
93

Pandas read sql integer became float

4 Answers4

Linked