0

I'm have an Excel file where the first column is:

enter image description here

(where TRUE/FALSE are Excel's boolean type or whatever Excel calls them)

I'm reading this file with python 3.6, pandas 1.0.3 and xlrd 1.2.0:

dat = pandas.read_excel('simple.xlsx', engine= 'xlrd', header= None)

The TRUE/FALSEs seem to be converted in an inconsistent way:

print(list(dat.iloc[:,0]))

['col1', 1, 2, 3, 6, 1, False, 1]

TRUE has been converted to 1 and FALSE to False. This means I cannot tell whether 1 is actually the number 1 or True.

Is there a way to tell pandas.read_excel to consistently convert TRUE to True and FALSE to False?

(NB I'm using xlrd rather then openpyxl since xlrd is supported by more versions of pandas - I'm happy to switch to openpyxl if that helps here...)

dariober
  • 8,240
  • 3
  • 30
  • 47
  • Seems like you don't want `header=None` since `'col1'` is now considered a value. Either way you can add `dtype=object` as an argument to `read_excel` or if you have the headers better to do `dtype={'col1': object}` so it only casts that column to object. – ALollz Mar 22 '20 at 16:44
  • 1
    I actually just tried: `dtype={'col1': 'object'}, true_values=['TRUE'], false_values=['FALSE']`, doesn't seem to work, still casts `TRUE` to `1`, weird. – Erfan Mar 22 '20 at 16:46
  • Weird, maybe they are strings then? When I made a sample FALSE was converted to 0, unlike the behavior they are seeing, and then `dtype=object` worked as expected. – ALollz Mar 22 '20 at 16:47
  • yea, try it without the true or false values options. read it in with ALollz suggestion of just dtype and see if it works – sammywemmy Mar 22 '20 at 16:48
  • For the record, the example I posted is a dummy file with nothing else than the data in the screenshot. Created on Ubuntu 16 with OO Calc – dariober Mar 22 '20 at 16:49
  • Can you try with exactly the same excel file as OP? @ALollz – Erfan Mar 22 '20 at 16:50
  • Perhaps my excel is outdated, but when I fill a cell with TRUE, it is centered, not right aligned in the cell. – ALollz Mar 22 '20 at 16:51
  • Seemingly related: https://stackoverflow.com/questions/51661980/pandas-read-excel-sometimes-incorrectly-reads-boolean-values-as-1s-0s. – AMC Mar 22 '20 at 18:02

2 Answers2

0

Not really answering the question but that's how I resolved it. Basically, I needed an Excel to csv exporter and I hoped to get away with a few lines of pandas.

I ended up writing my own excelToCsv with Java and the POI library which seems much more sophisticated than xlrd, openpyxl etc...

dariober
  • 8,240
  • 3
  • 30
  • 47
0

Indeed, replacing TRUE into True in excel doesn't work, also problems with date formats (but that is easier to handle in pandas).

Save the file it as .csv, then replace the duty in a text-editor.

I just did the replacing in pandas with the apply method, but I think it would have been much easier to just do it in a text-editor.

ML Rozendale
  • 135
  • 1
  • 1
  • 7