2

I am trying to create an excel file using python from a list of dictionaries. Initially I was getting an error of improper encoding. So I decoded my data to 'utf-8' format. Now after the creation of excel, when I checked the values in each field, their format has been changed to text only. Below are the stpes I used while performing this activity with a snippet of code.

1.>I got error of improper encoding while creating excel file as my data had some 'ascii' values in it. Error snippet:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 16: ordinal not in range(128)

2.>To remove the error of improper encoding, I inserted a decode() function while reading my input csv file. Snippet of code while decoding to 'utf-8':

data = []
with open(datafile, "r") as f:
    header = f.readline().split(',')
    counter = 0
    for line in f:
        line = line.decode('utf-8')

        fields = line.split(',')
        entry = {}
        for i,value in enumerate(fields):
            entry[header[i].strip()] = value.strip()
        data.append(entry)
        counter += 1

return data

3.>After inserting decode() funtion, I created my excel file using below code:

ordered_list= dicts[0].keys() 

wb=Workbook("New File.xlsx")
ws=wb.add_worksheet("Unique")

first_row=0
for header in ordered_list:
   col=ordered_list.index(header)
   ws.write(first_row,col,header) 

row=1
for trans in dicts:
   for _key,_value in trans.items():
       col=ordered_list.index(_key)
       ws.write(row,col,_value)
   row+=1 #enter the next row
wb.close()

But after creation of excel, all the values in each field of excel is coming with text format and not their original format (some datetime values, decimal values etc.). How do I make sure to get that the data format does not change from the input data format I read using input csv file?

Mohit Sharma
  • 179
  • 1
  • 2
  • 11
  • Please reduce your failing program to a short, complete program that demonstrates the error. Please post that short, complete program in its entirety, along with a description of the result you observe and the result you expect. See [ask] and [mcve] for more information. – Robᵩ Jan 18 '16 at 16:09
  • @Robᵩ I have edited and made my question in a structured manner. If you think its still not clear, do let me know about the issue. I'll try to provide as much detail as possible. – Mohit Sharma Jan 18 '16 at 16:22
  • Python 2.x or Python 3.x? Which Excel library are you using? Openpyxl? – Alastair McCormack Jan 19 '16 at 08:54
  • @AlastairMcCormack I am using Python 2.x. Initially I was using Xlsxwriter for writing into excel workbook. Then I used Xlrd library for both reading and writing the excel after converting my input csv file to xlsx sheet. Thanks for your help! – Mohit Sharma Jan 19 '16 at 11:15
  • xlrd is for old style .xls and your code shows that you're reading from a CSV. So which one are you using now? – Alastair McCormack Jan 19 '16 at 11:22
  • @AlastairMcCormack I was using csv file initially, but now I changed the format from csv to xlsx and using the xlsx version now. Xlrd library is working with xlsx files as well. Only issue left is with datetime strings on which I am working now with the suggestion you had provided earlier. – Mohit Sharma Jan 19 '16 at 11:26

1 Answers1

1

When reading text files you should pass the encoding to open() so that it's automatically decoded for you.

Python 2.x:

with io.open(datafile, "r", encoding="utf-8") as f:

Python 3.x:

with open(datafile, "r", encoding="utf-8") as f:

Now each line read will be a Unicode string.

As you're reading a CSV file, you may want to consider the CSV module, which understands CSV dialects. It will automatically return dictionaries per row, keyed by the header. In Python 3, it's just the csv module. In Python, the CSV module is broken with non-ASCII. Use https://pypi.python.org/pypi/unicodecsv/0.13.0

Once you have clean Unicode strings, you can proceed to store the data.

The Excel format requires that you tell it what kind of data you're storing. If you put a timestamp string into a cell, it will think it's just a string. The same applies if you insert a string of an integer.

Therefore, you need to convert the value type in Python before adding to the workbook.

Convert to decimal string to float:

 my_decimal = float(row["height"])
 ws.write(row,col,my_decimal)

Create datetime field from string. Assuming string is "Jun 1 2005 1:33PM":

date_object = datetime.strptime(my_date, '%b %d %Y %I:%M%p')
ws.write(row,col,date_object)
Alastair McCormack
  • 26,573
  • 8
  • 77
  • 100
  • Do I need to use encoding='utf-8' or decode('utf-8') while reading input file? I already tried using [encoding='utf-8'] and it was throwing syntax error to me. – Mohit Sharma Jan 19 '16 at 11:17
  • If you open the file with `encoding="utf-8"`, you no longer need to call `decode()` on each line. – Alastair McCormack Jan 19 '16 at 11:18