5

I am trying to read an excel file using xlrd in python3, but I am not getting the results from reading the cells as I do see them in excel.

It seems the cells I am trying to access have some external source set, from which they are refreshed, but since that source is unavailable to me, excel displays some cached values. These values reside at some other place in the same excel file, so while in the sheet you see e.g. "2.65111" the actual contents of the cell shows up as "=BC12" in this edit bar in excel.

When I try to read this via xlrd like:

wb=xlrd.open_workbook("filename.xls",formatting_info=True)
wb.sheet_by_name("sheetname").cell(12,9)

The cell contents shows up as:

'text:'[XXXX]' (XF:95)'

(where XXXX is most likely the name of a column of this external .csv file)

I would either expect xlrd to return the "2.65111" I see in excel, or at least the "=BC12" as a reference to the other cell, but I can not use in any way the "[XXXXX]" thing.

Is there a way to get xlrd to show any of these two?

I am really confused as to where it has this "[XXXXX]" from, and why this is so different to what I see in excel.

PS: I can not just use the data from the BC12 cell, as this should be an automated processing of files, and only the "original" cells have a fixed position, and these others can be just at another random position the next day.

Jon Clements
  • 138,671
  • 33
  • 247
  • 280
PlasmaHH
  • 15,673
  • 5
  • 44
  • 57
  • 1
    You say "where XXXX is most likely the name of a column of this external .csv file)" ... WHAT external csv file? What leads you to draw such a conclusion? What is the formula in the BC12 cell? Can you make a copy of the xls file available for analysis? – John Machin Feb 27 '13 at 19:37
  • What happens when you do: `wb.sheet_by_name("sheetname").cell(12,9).value`? – eazar001 Feb 28 '13 at 06:47
  • @eazar001: It returns '[XXXX]' – PlasmaHH Feb 28 '13 at 08:58
  • @JohnMachin: The BC12 cell does not contain a formula but a string "2.6511". I am not familiar with what the english excel calls these things, but when you click on the "data" tab on the "connections" button, then it shows external files there. Also when you right click on the 12,9 cell, you can select something that translates to "properties of data area". The file can not be made publically available. – PlasmaHH Feb 28 '13 at 09:01
  • @PlasmaHH, What language are you using in your excel document? Japanese, Spanish, English....? – eazar001 Mar 01 '13 at 01:18
  • @PlasmaHH: (a) Please answer my first 2 questions (b) Please consider making one or more of the daily-issued files available to me (author of `xlrd`) for analysis, under an NDA if you wish. (c) Python 3 support by xlrd is so new the paint is dripping off it; have you used the released version or have you got the latest from github? Have you tried using Python 2.7 (if only for temporary diagnostic purposes)? (d) XLS file or XLSX or XLSM or ?? – John Machin Mar 01 '13 at 19:29
  • @JohnMachin: (a) 1st q: I can not tell anything more about the csv file, when I am at work again I could give you the name if it helps, but that file is not available to me. 2nd q: we seem to have a different understanding of "formula" here, can you please explain what you mean? As said, the BC12 cell contains some text. (b) I can not, as I receive them myself under a contract that disallows this. (c) I can try 2.7 on monday, but would need it in the final for 3. I am using the latest released version that I could find (0.9.0). (d) .xls, I would assume saved by excel 2007. – PlasmaHH Mar 02 '13 at 14:27
  • @PlasmaHH: My 2nd question was "What leads you to draw such a conclusion?". – John Machin Mar 02 '13 at 19:43
  • @JohnMachin: Ah, I misread this part. As I explained above, the excel file contains some external references to some .csv files, and the return of xlrd (the [XXXXXX]) is something I would expect to be a possible column name of the .csv files, given their names. – PlasmaHH Mar 04 '13 at 09:41
  • @JohnMachin: Just tried python 2.7 and git version of xlrd, all with the same result as before. – PlasmaHH Mar 04 '13 at 10:35
  • @PlasmaHH have you tried something like `wb.sheet_by_name("sheetname").cell(12,9).formula.text`? I am not a Py guy but an Excel guy. You SHOULD somehow be able to access at least the cell formula in order to see the `=BC12` which you said you may be able to use. – David Zemens Mar 06 '13 at 15:42
  • @DavidZemens: I was wondering about this too, but didn't get very far. Cells only have ctype,value and xf_index attributes. The later one somehow relates to some formatting object, but I don't think the formula is within there. I have tried looking around at various places but am not familiar enough with xlrd to find the formula, if its accessible. – PlasmaHH Mar 06 '13 at 15:56
  • @PlasmaHH this may be useful, not sure if you've seen it but there appear to be some options, in particular check the second answer posted at this thread: http://stackoverflow.com/questions/4690423/get-formula-from-excel-cell-with-python-xlrd – David Zemens Mar 06 '13 at 16:04
  • @DavidZemens: If you are referring to the win32com answer, I have to run this with python3 on linux... – PlasmaHH Mar 07 '13 at 09:15

3 Answers3

1

One solution might be to make a copy of your excel sheet that ONLY contains values, no formulas or external sources.

You can do that by highlighting and copying everything, then 'paste special' into a new sheet, and select 'values only' or something like that.

Hoff
  • 38,776
  • 17
  • 74
  • 99
  • Oh, I was assuming that it was clear that I am trying to write a script for automated processing of that excel file (which will be shipped every day with updated data to me). – PlasmaHH Feb 28 '13 at 22:25
0

"the 12,9 cell" is known to Excel as J13 (NOT J12). Are you sure that you are comparing like with like? Is it expected that J13 would contain a reference to BC12? Can you deduce anything by examining cells around J13 and BC12?

John Machin
  • 81,303
  • 11
  • 141
  • 189
  • I am not sure what brought J12 in here, the cell is the one I would like to access. Actually, there are many more cells, all behaving the same way (just with different possible .csv header names) – PlasmaHH Mar 04 '13 at 10:15
0

It may be that your only option is to preprocess the file with the actual Excel through Excel COM API. You can do something simple in .NET which will load the excel file, get the data and save it in a similary structured excel file but with no external references. Loading an excel file with Excel takes a long time (a few seconds) and will certainly fail if you overload it on a server, but if you have a few files a day, it will work. Of course it won't work if you are not running on Windows.

Alon Catz
  • 2,417
  • 1
  • 19
  • 23
  • Unfortunately (or not, depends) I need to run all this on linux – PlasmaHH Mar 06 '13 at 14:33
  • The best library I know to work with Excel (and Office in general) is Aspose.Cells. I only used its .NET version but there is also a Java version which will work on Linux. The have a trial, so you can test if does the trick. – Alon Catz Mar 07 '13 at 03:18