Python df.to_excel() storing numbers as text in excel. How to store as Value?

Question

I am scraping table data from google finance through pd.read_html and then saving that data to excel through df.to_excel() as seen below:

    dfs = pd.read_html('https://www.google.com/finance?q=NASDAQ%3AGOOGL&fstype=ii&ei=9YBMWIiaLo29e83Rr9AM', flavor='html5lib')
    xlWriter = pd.ExcelWriter(output.xlsx, engine='xlsxwriter')

    for i, df in enumerate(dfs):
        df.to_excel(xlWriter, sheet_name='Sheet{}'.format(i))
    xlWriter.save()

However, the numbers that are saved to excel are stored as text with the little green triangle in the corner of the cell. When moving over this data to excel, how do I store them as actual values and not text?

How are those columns coming in with pandas: `print(df.dtypes)`? Might need to run conversions. — Parfait, Dec 11 '16 at 00:13
It looks like they are all listed as 'objects'. What do I need to do to run conversions to 'float' or 'int'? I can't find anything about converting to a value before hitting excel. — gluc7, Dec 11 '16 at 01:04
Thx for the upvote @gluc7. Re-reading through your question, I think it would be a good idea to edit its title (if possible, not sure, I'm still newish to SO) because the problem wasn't really with the df.to_excel method... And the solutions don't deal with anything specific to exporting to Excel. — bluu, Aug 10 '17 at 01:39

jmcnamara · Answer 1 · 2021-09-17T12:06:09.603

33

In addition to the other solutions where the string data is converted to numbers when creating or using the dataframe it is also possible to do it using options to the xlsxwriter engine:

# Versions of Pandas >= 1.3.0:
writer = pd.ExcelWriter('output.xlsx',
                        engine='xlsxwriter',
                        engine_kwargs={'options': {'strings_to_numbers': True}})

# Versions of Pandas < 1.3.0:
writer = pd.ExcelWriter('output.xlsx',
                        engine='xlsxwriter',
                        options={'strings_to_numbers': True})

From the docs:

strings_to_numbers: Enable the worksheet.write() method to convert strings to numbers, where possible, using float() in order to avoid an Excel warning about "Numbers Stored as Text".

edited Sep 17 '21 at 12:06

answered Aug 09 '17 at 08:55

jmcnamara

38,196
6
90
108

3

This is the actual Solution. Thanks a lot. My dataframe indeed has mixed String and float, so the column was obviously Object. But I still wanted Excel to display the number as numbers – Laurent T May 14 '20 at 04:21
1

Answer from the creator of the package himself – Akhil Mathew Jun 30 '22 at 04:52

Parfait · Accepted Answer · 2016-12-11T04:42:27.437

Consider converting numeric columns to floats since the pd.read_html reads web data as string types (i.e., objects). But before converting to floats, you need to replace hyphens to NaNs:

import pandas as pd
import numpy as np

dfs = pd.read_html('https://www.google.com/finance?q=NASDAQ%3AGOOGL' +
                   '&fstype=ii&ei=9YBMWIiaLo29e83Rr9AM', flavor='html5lib')
xlWriter = pd.ExcelWriter('Output.xlsx', engine='xlsxwriter')
workbook = xlWriter.book

for i, df in enumerate(dfs):
    for col in df.columns[1:]:                  # UPDATE ONLY NUMERIC COLS 
        df.loc[df[col] == '-', col] = np.nan    # REPLACE HYPHEN WITH NaNs
        df[col] = df[col].astype(float)         # CONVERT TO FLOAT   

    df.to_excel(xlWriter, sheet_name='Sheet{}'.format(i))

xlWriter.save()

Gonçalo Peres · Answer 3 · 2020-08-10T15:38:46.987

That is probably because the Data Types of those columns where the warning is showing are objects and not Numeric Types, such as int or float.

In order to check the Data Types of each column of the DataFrame, use dtypes, such as

print(df.dtypes)

In my case, the column that was stored as object instead of a numeric value, was PRECO_ES

As, in my particular case, the decimal numbers are relevant, I have converted it, using astype, to float, as following

df['PRECO_ES'] = df['PRECO_ES'].astype(float)

If we check again the Data Types, we get the following

Then, all you have to do is export the DataFrame to Excel

#Export the DataFRame (df) to XLS
xlsFile = "Preco20102019.xls"
df.to_excel(xlsFile)

#Export the DataFRame (df) to CSV
csvFile = "Preco20102019.csv"
df.to_csv(csvFile)

If I then open the Excel file, I can see that the warning is not showing anymore, as the values are stored as numeric and not as text

For those using strings call: df_employees[col].astype(pd.StringDtype()) — GilbertS, Oct 21 '20 at 09:47

score 1 · Answer 4 · answered Dec 10 '16 at 22:43

1

Did you verify that the columns that you're exporting are actually numbers in python (int or float)?

Alternatively, you can convert the text fields into numbers in excel using the =VALUE() function.

answered Dec 10 '16 at 22:43

Felix

61
6

Sorry, I'm still new to python. How do you check if they are actually numbers? I tried the 'type' function but I don't believe it works on variables. I would also like to try to do most of the work within in python so that the excel workbook is already converted when opened. – gluc7 Dec 10 '16 at 23:33
1

Try `df.dtypes`. If you need to convert, use `pd.to_numeric()`. – Felix Dec 18 '16 at 05:36

score 1 · Answer 5 · answered Aug 09 '17 at 05:40

Since pandas 0.19, you can supply the argument na_values to pd.read_html which will allow pandas to correctly automatically infer the float type to your price columns...

Here's how that would look like:

dfs = pd.read_html(
    'https://www.google.com/finance?q=NASDAQ%3AGOOGL&fstype=ii&ei=9YBMWIiaLo29e83Rr9AM',
    flavor='html5lib',
    index_col='\nIn Millions of USD (except for per share items)\n',
    na_values='-'
)

xlWriter = pd.ExcelWriter('Output.xlsx', engine='xlsxwriter')
for i, df in enumerate(dfs):
    df.to_excel(xlWriter, sheet_name='Sheet{}'.format(i))
xlWriter.save()

Alternatively (if you don't have pandas 0.19 yet), I'd use a simpler version of @Parfait's solution:

dfs = pd.read_html(
    'https://www.google.com/finance?q=NASDAQ%3AGOOGL&fstype=ii&ei=9YBMWIiaLo29e83Rr9AM',
    flavor='html5lib',
    index_col='\nIn Millions of USD (except for per share items)\n'
)

xlWriter = pd.ExcelWriter('Output.xlsx', engine='xlsxwriter')
for i, df in enumerate(dfs):
    df.mask(df == '-').astype(float).to_excel(xlWriter, sheet_name='Sheet{}'.format(i))
xlWriter.save()

This second solution only works if you correctly define your index column (in the .read_html), it will fail miserably with a ValueError if one of the (data) columns contains anything that is not convertible to a float...

score 0 · Answer 6 · answered Oct 21 '20 at 09:48

0

If you want your excel sheet to have string data type do like so:

for col in original_columns:
    df_employees[col] = df_employees[col].astype(pd.StringDtype())

answered Oct 21 '20 at 09:48

GilbertS

591
8
12

AttributeError: module 'pandas' has no attribute 'StringDtype' for Version 0.23.4 – NikoTumi Jan 26 '21 at 15:25
@NikoTumi Hey, were you able to solve your problem. I didnt get time to respond in time but I can help if your code still has an issue – GilbertS Feb 01 '21 at 18:17

Python df.to_excel() storing numbers as text in excel. How to store as Value?

6 Answers6

Linked