14

When I use the Excel "Document Inspector" on a particular .xls file to check for "hidden properties or personal information" it says:

The following document information was found:
* Absolute path to the workbook

How can I obtain the absolute path of the workbook from the file? If it needs to be done programmatically, I could use Java (e.g. Apache POI) or VBA.

I know where the file is currently saved, but what I want to extract is the absolute path to the workbook which is saved in the file I have. This is so I can know where it was saved by the author.

Here's what has happened to the file:

  1. Someone authored it, saving it at some absolute filepath unknown to me
  2. They uploaded it to a website
  3. I downloaded it from the website

Excel indicates that the document contains the absolute path from step 1. I'm after this path, not the place I saved it at step 3 since I know that.

ashleedawg
  • 20,365
  • 9
  • 72
  • 105
WW.
  • 23,793
  • 13
  • 94
  • 121
  • 1
    perhaps `=CELL("filename")` works for you? – CallumDA Dec 14 '16 at 00:06
  • @CallumDA This returns the current spreadsheet location and workbook name. – WW. Dec 14 '16 at 00:13
  • What's the difference then? I'm not sure what you're looking for. Doesn't `=Cell("filename")` return something like `C:\Users\Myself\blah\[VariousData.xlsx]SummarySheet`? What's the "abosulte path" you're looking for? – BruceWayne Dec 14 '16 at 00:24
  • @BruceWayne I've some steps to the question in an attempt to clarify. I want to know the absolute path to the document which is saved in the spreadsheet itself, rather than where it is now. – WW. Dec 14 '16 at 00:35

2 Answers2

11

I can reproduced that warning message by simply creating an empty Excel file, added a formula, saved it as BIFF8 (.xls). The Document Inspector will then warn about the absolute path. ... but in my case, there was no filename inside the file.

A simple way to verify this, is to open the file in a hex-editor and search for a well-known save location (i.e. the location where a dummy/test file was stored) - this is either stored as ASCII or as 16-bit string, i.e. every odd byte is a character.

If you want to use the POI developer tools, you can use the following:

To list all Excel records:
java -cp poi-3.16-beta1.jar org.apache.poi.hssf.dev.BiffViewer file.xls

To list the document and summary properties:
java -cp poi-3.16-beta1.jar org.apache.poi.hpsf.extractor.HPSFPropertiesExtractor file.xls

To list any embedded objects beside the usual suspects SummaryInformation, DocumentSummaryInformation and Workbook:
java -cp poi-3.16-beta1.jar org.apache.poi.poifs.dev.POIFSLister file.xls

So after running the tools and recording the output, you can then remove the properties via the Excel Document Inspector and execute the tools again. The output can be diffed and you might find the culprit.

kiwiwings
  • 3,386
  • 1
  • 21
  • 57
  • I tried a hex editor on the file I have and found nothing before posting this question. I thought perhaps Excel used some esoteric encoding. I've ran the commands you listed, which all worked but none produced the full path. So potentially it's being reported incorrectly by Excel in the first place. – WW. Dec 20 '16 at 02:44
9

Assuming it is an .xlsx file rather than an older-style .xls file, you can

  1. Rename the workbook as a .zip file
  2. Look at the xl\workbook.xml "file" within the .zip file

and you will find the absolute path when last saved from Excel.

This is why it is not a good idea to share work-related spreadsheets with other people, unless you first clear out this sort of information.


I'm not sure how to find it in the binary format files.

YowE3K
  • 23,852
  • 7
  • 26
  • 40
  • "This is why it is not a good idea to share work-related spreadsheets with other people, unless you first clear out this sort of information" - how would one make sure this info. isn't there, when sharing worksheets? – BruceWayne Dec 14 '16 at 06:54
  • 1
    @BruceWayne - that would be the same way that the OP did - by using the Document Inspector. In Excel 2010 that can be found from File / Info / Check for Issues. Other versions will have slightly different ways to get to it. – YowE3K Dec 14 '16 at 07:33
  • Ah! Sorry - didn't know that was what that does...I thought it just *checked* for issues, not also helps you remove such info. Thanks! – BruceWayne Dec 14 '16 at 07:44
  • 1
    @BruceWayne - to be honest, I've never actually used it myself, and I only found out it even existed a couple of months ago - it's not exactly the sort of thing one expects to find, or goes looking for unless someone mentions it exists. – YowE3K Dec 14 '16 at 07:46
  • Thanks for this answer, but unfortunately I have an .xls file. – WW. Dec 15 '16 at 01:58
  • @WW. - is it 'real' xls file or an xlsx where someone then did a 'save as' as xls ? – Robin Mackenzie Dec 21 '16 at 10:14
  • I think it's a real xls. How would I tell? – WW. Dec 21 '16 at 10:46
  • @WW the [WorkbookFactory](http://poi.apache.org/apidocs/org/apache/poi/ss/usermodel/WorkbookFactory.html) parses the file magic and not the extension ... if you receive a HSSFWorkbook its in BIFF8 format otherwise its in OOXML format – kiwiwings Dec 22 '16 at 06:33