1

I have to import an .xls file which is saved as .*htm, .*html. I attached a link which provides a sample file of that format.

The Actual first row of the table starts from row number 5. But there are data above it.

The file looks as below,

Sample File

The sample file.

But please make sure to include some rows on top of it with some test values and make it look like the screenshot above.

if there are no rows above it, then the following M Code provided by Alexis Olson works

let
    Source = Folder.Files("C:\Users\aolson\Downloads\example-html.xls"),
    #"Filtered Rows" = Table.SelectRows(Source, each ([Extension] = ".xls")),
    #"C:\Users\aolson\Downloads\example-html xls\_example-html xls" = #"Filtered Rows"{[#"Folder Path"="C:\Users\aolson\Downloads\example-html.xls\",Name="example-html.xls"]}[Content],
    #"Imported Excel" = Web.Page(#"C:\Users\aolson\Downloads\example-html xls\_example-html xls"){0}[Data]
in
    #"Imported Excel"

When I add rows on top of the sample and click on save in excel - it gives me a warning whether, I want to continue with the same format then I click on "YES".

I tried to play with the children table on the Query Editor. But it is not taking me anywhere.

Source = Table cannot be found inside it at all.

James Z
  • 12,209
  • 10
  • 24
  • 44
  • When I added those extra rows, the `Excel.Workbook` (instead of `Web.Page`) worked for me until I tried adding text to cell B3 too, in which case it broke. – Alexis Olson Oct 05 '18 at 13:39
  • Oh God..Excel is so weird. Don't add any rows. don't add new data. just open the file and click on save and click on "yes" to continue with the same format. It fails with Web.Page function. @AlexisOlson – Student of the Digital World Oct 05 '18 at 13:44
  • Yeah. It really is weird. What I was saying though is if I add that extra bit neither option works and it's probably replicating the behavior of your actual non-sample files. – Alexis Olson Oct 05 '18 at 13:46
  • @AlexisOlson Yes I am thinking the same. That's why, those files didn't work for me when I tried to read it as HTML. So, fixing this should fix that. – Student of the Digital World Oct 05 '18 at 13:47
  • I've also noticed that if I open it up in Notepad, I can't find any of the cell data anywhere in the text. It's just a bunch of functions and variables. – Alexis Olson Oct 05 '18 at 13:47
  • @AlexisOlson I don't know but in my actual file I am seeing the cell data. Give me 5 minutes and let me post the column data that I opened as a text here. – Student of the Digital World Oct 05 '18 at 13:53
  • @AlexisOlson Can you please open the sample file right after the download in Power BI as a text file. I am able to open my actual file as a text. Able to see all the td's,tr's inside it with the row and column values. Is it possible to transform this using a function or something in Power Query ? if yes, can you please show me with the sample file then I will be able to revert it to my actual file. – Student of the Digital World Oct 05 '18 at 14:33
  • I can see the text from the starting file. When I added the extra rows and re-saved it moved the data to a folder with several components, so the xls file I was opening in text didn't have the data itself but was instead referencing the folder. – Alexis Olson Oct 05 '18 at 14:40
  • @AlexisOlson Yes, I think - if we can transform the texts from starting file, then it will fulfill my requirement of the actual file. Because, I see the texts in both the files. I was adding in rows to make sure that the files look the same. But it doesn't have to, if there is a way to transform that single column of text into an actual table. – Student of the Digital World Oct 05 '18 at 14:46
  • @AlexisOlson Sorry to bother you again. Please help me with this - https://stackoverflow.com/questions/52708114/power-query-data-transformation-from-a-single-column-to-a-whole-table – Student of the Digital World Oct 09 '18 at 12:18
  • @AlexisOlson Sorry but another problem in Power Query - Would you be able to share some knowledge on this - https://stackoverflow.com/questions/53780389/power-query-read-each-url-and-append-the-output-to-a-table – Student of the Digital World Dec 14 '18 at 13:43

1 Answers1

1

For whatever reason, the HTML in the sample file has unmatched tags that the XML parser doesn't like. You can get at the data though with some work if you load it as text and remove or fix any parts that the parser has trouble with.

Consider this M code:

let
    Source = Table.FromColumns({Lines.FromBinary(File.Contents("C:\Users\aolson\Downloads\example-html.xls\example-html.xls"))}),
    #"Kept Range of Rows" = Table.Range(Source,60,22),
    Column1 = Text.Combine(#"Kept Range of Rows"[Column1]),
    #"Parsed XML" = Xml.Tables(Column1),
    Table = #"Parsed XML"{0}[Table],
    #"Expanded td" = Table.ExpandTableColumn(Table, "td", {"i", "b", "span", "Element:Text"}, {"td.i", "td.b", "td.span", "td.Element:Text"}),
    #"Expanded td.span" = Table.ExpandTableColumn(#"Expanded td", "td.span", {"Element:Text", "Attribute:style"}, {"td.span.Element:Text", "td.span.Attribute:style"})
in
    #"Expanded td.span"

The steps here are roughly:

  1. Load the file as text
  2. Select just the <tbody> section.
  3. Concatenate those rows into a single text value.
  4. Parse that text as XML.
  5. Expand any tables that are found.

When I initially did this I noticed the <b> tag wasn't closed so I added a </b> in my source file.

Resulting Table

The results are a bit ugly, but I suspect if your actual data files don't include much formatting or inconsistent table structure, then you can get something along these lines working passably well, especially if you only have a single column to deal with.

Alexis Olson
  • 38,724
  • 7
  • 42
  • 64