0

While importing data from the web in Excel 2019 choosing Data>Get Data>From Other Sources>From Web, the last (trailing) zeros of numbers are being truncated resulting in the following 'Import' column:

EU
Import | Desired
968,8  |  968800
891,01 |  891010
413,47 |  413470
410,3  |  410300
43,25  |   43250
17,8   |   17800
15,05  |   15050
3,61   |    3610
6,05   |    6050
4,9    |    4900

US
Import | Desired
968.8  |  968800
891.01 |  891010
413.47 |  413470
410.3  |  410300
43.25  |   43250
17.8   |   17800
15.05  |   15050
3.61   |    3610
6.05   |    6050
4.9    |    4900

I would like to convert the data which is text (commas, periods are remaining thousands separators), to numbers like in the Desired column.

I've overdone the following working VBA function:

Option Explicit

''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
Function UnTruncate(SourceVariant As Variant, _
  Optional TruncateString As String = "0", _
  Optional SplitSeparator As String = ",", _
  Optional NumberOfDigits As Long = 3) As Long

    Dim vnt As Variant        ' String Array (0-based, 1-dimensional)
    Dim strSource As String   ' Source String
    Dim strResult As String   ' Resulting String
    Dim strUB As String       ' Upper Bound String
    Dim i As Long             ' String Array Elements Counter

    ' Convert SourceVariant to a string (Source String (strSource)).
    strSource = CStr(SourceVariant)

    ' Check if Source String (strSource) is "" (UnTruncate = 0, by default).
    If strSource = "" Then Exit Function

    ' Split Source String (strSource) by SplitSeparator.
    vnt = Split(strSource, SplitSeparator)
    ' Assign the value of the last element in String Array (vnt)
    ' to Upper Bound String (strUB).
    strUB = vnt(UBound(vnt))

    ' Check if there is only one element in String Array (vnt). If so,
    ' write its value (strUB) to Resulting String (strResult) and go to
    ' ProcedureSuccess.
    If UBound(vnt) = 0 Then strResult = strUB: GoTo ProcedureSuccess

    ' Check if the length of Upper Bound String (strUB) is greater than
    ' NumberOfDigits. (UnTruncate = 0, by default)
    If Len(strUB) > NumberOfDigits Then Exit Function

    ' Add the needed number of TruncateStrings to Upper Bound String.
    strUB = strUB & String(NumberOfDigits - Len(strUB), TruncateString)

    ' Loop through the elements of String Array (vnt), from beginning
    ' to the element before the last, and concatenate them one after another
    ' to the Resulting String (strResult).
    For i = 0 To UBound(vnt) - 1: strResult = strResult & vnt(i): Next
    ' Add Upper Bound String (strUB) to the end of Resulting String (strResult).
    strResult = strResult & strUB

ProcedureSuccess:
    ' Convert Resulting String (strResult) to the resulting value of UnTruncate.
    UnTruncate = Val(strResult)

End Function
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''

But I have a feeling I'm missing some important points.

I'm looking for other solutions: an improvement of my function, an Excel formula, a Power Query Solution, ... possibly when the data in Import column could be numbers or text.

VBasic2008
  • 44,888
  • 5
  • 17
  • 28
  • It may be enough to only change your decimal separator from "," to "."... – FaneDuru Mar 28 '20 at 14:45
  • The solution to the problem seems to depend on the type of data. Why don't we just remove the comma and period and multiply by 1000? – Dy.Lee Mar 28 '20 at 14:58
  • @Dy.Lee: My sample data is making it not obvious, sorry. There could be millions, billions then there would be more commas. But looks like a good solution for an Excel formula. Now just include a solution if this wasn't text but a number. I'm in Europe and I downloaded the data from a US website. So for me it could be a decimal number, but only below 1,000,000. – VBasic2008 Mar 28 '20 at 15:19
  • Perhaps this would be best handled at the import stage, rather than after. – Ron Rosenfeld Mar 28 '20 at 15:31
  • @RonRosenfeld: Can you suggest another way of importing where I could choose the formatting because Excel 'WebQuery' doesn't allow any changes in this manner? I've looked in Data Range Properties and Edit Query. – VBasic2008 Mar 28 '20 at 15:39
  • you might be able to scrape the page with either xhr or browser automation. Is there an url? – QHarr Mar 28 '20 at 15:45
  • 1
    Usually there will be a Transform option where you can specify a data type. – Ron Rosenfeld Mar 28 '20 at 16:34
  • @QHarr: For example https://kworb.net/youtube/, but I have no idea what both of these are, if you could elaborate. – VBasic2008 Mar 28 '20 at 17:20
  • ^ is there a number on that page that gets truncated for you? And if so, how does it currently appear on the page for you? For the description: see [this](https://en.wikipedia.org/wiki/Web_scraping) and [this](https://codingislove.com/http-requests-excel-vba/) – QHarr Mar 28 '20 at 17:22
  • Using your URL, if I use `Data -->Get & Transform --> From Web` and select `Table0`, it loads with the `Views` and `Likes` column properly downloaded as numbers. I did not even need to do any special formatting. What are you doing differently? – Ron Rosenfeld Mar 28 '20 at 17:51
  • @RonRosenfeld & QHarr: I was unintentionally using Data>Get Data>Legacy Wizards>From Web (Legacy) which is the very old version. After using the real (new) version, I only had to remove the commas (my thousands separator is ".") in the last 2 columns and change the type to Number. By the way, all this showed me that the conversion afterwards is not possible because how would one know if it was 2 or 2000, 20 or 20000 etc. previously. I initially thought the truncation was maximally 2 zeros. Quite a lesson for me and a waste of time for you. Sorry. Thanks for the help. – VBasic2008 Mar 28 '20 at 18:22
  • @RonRosenfeld & QHarr: If you would post a suitable answer, I would gladly accept it. If you think the question is rather useless now I can delete it. Please, advise. – VBasic2008 Mar 28 '20 at 18:26

2 Answers2

2

This is an example of issuing an xhr to the url you mentioned and using clipboard to copy table to sheet. Numbers appear as on page. You do need to have some familiarity with html or at least know how to right click inspect element (opens elements tab); right click copy selector in dev tools elements tab - you can then paste that selector into html.querySelector("selector goes here").outerHTML; assuming selecting a table.

Public Sub GetVideoInfo()
    Dim xhr As Object, clipboard As Object, html As MSHTML.HTMLDocument 'required VBE > Tools > References > Microsoft HTML Object Library

    Set clipboard = GetObject("New:{1C3B4210-F441-11CE-B9EA-00AA006B1A69}")
    Set xhr = CreateObject("MSXML2.XMLHTTP")
    Set html = New MSHTML.HTMLDocument

    With xhr
        .Open "GET", "https://kworb.net/youtube/", False
        .send
        html.body.innerHTML = .responseText
    End With
    clipboard.SetText html.querySelector("#youtuberealtime").outerHTML
    clipboard.PutInClipboard
    ActiveSheet.Cells(1, 1).PasteSpecial
End Sub
QHarr
  • 83,427
  • 12
  • 54
  • 101
  • Using your solution, I'm again getting the undesired results of the 'old' Web Query (Import column) probably due to locale trouble (EU-US). I've enabled the HTML... in references, found the query selector on the web page. Where did this number in GetObject come from? Since this is probably the way to go, I would additionally ask you how to extract a 'single' element like 'pagetitle' or a single column from the table with values converted to text How is that even possible? I've seen truncated leading zeros, but 'trailing' ones!? Feel free to ignore all this since you've done a lot already. – VBasic2008 Mar 28 '20 at 21:45
  • The GetObject is using the GUID for the ms forms clipboard. It is late bound but the library for early bound can be added via project references (https://excel-macro.tutorialhorizon.com/vba-excel-reference-libraries-in-excel-workbook/) or by adding a form to your project. With respect to the output you are getting, can you share a link to an image of your output please? – QHarr Mar 28 '20 at 22:00
  • For page title html.querySelector("title").innerText should work. For a single column you can apply formatting to sheet after paste or use querySelectorAll to gather a nodeList of the elements within a specific column. Is a bit more fiddly due to limited syntax in vba html parser implementations but perfectly doable. Happy to provide examples upon request. – QHarr Mar 28 '20 at 22:04
  • [HTTP (yours)]:https://drive.google.com/open?id=15Iq_yHy9BWHLHYaYJUc1udoeCbKR81D3 [Old Web Query]:https://drive.google.com/open?id=1AoYK0ibj2uhtIFZSSz_S_u2kP7Wvv7hS [Powerquery] https://drive.google.com/open?id=12Sag9crDFhbBaVEsvYWrRSq3rvFbXiBK – VBasic2008 Mar 28 '20 at 22:32
1

It seems you were using the Legacy Wizard rather than Power Query.

If you use Power Query, after selecting the Table, select Transform.

Then, if the number column has been imported as text, and is showing the digits separator of the comma, don't remove the commas. Rather:

  • Right Click on the Column Header
  • From the Right-Click dropdown menu:
    • Select Change Type --> Using Locale
    • Data type: Whole Number

enter image description here

That should take care of things.

EDIT:

With regard to retaining hyperlinks from a web table using Power Query, it is not as straightforward as with the Legacy Wizard, but here is a method that seems to work with your source.

It requires three queries and a function. And you will need to edit the table after the download to format the numbers, and possible the hyperlinks.

  • Query "Table 0" Download the web table without links
  • Query "getLinks" Download the links associated with the Videos
  • Query "Merge1" Merge the two queries above
  • Query fx"ExcelTrim" Replicate Excel's trim so as to be able to match the video names in the first two queries, by eliminating excess spaces between words in the video title.

ExcelTrim

Enter the code below into the Advanced Editor of a Blank Query

let ExcelTrim = (TextToTrim) =>
    let
        ReplacedText = Text.Replace(TextToTrim, "  ", " "),
        Result = if not(Text.Contains(ReplacedText, "  "))
            then ReplacedText
                else @ExcelTrim(ReplacedText)
    in
        Text.Trim(Result)
in
    ExcelTrim

Table 0

Note I used the Changed Type with Locale feature which should eliminate your dropped zero's problem.

let
    Source = Web.Page(Web.Contents("https://kworb.net/youtube/")),
    Data = Source{0}[Data],
    #"Changed Type with Locale" = Table.TransformColumnTypes(Data, {{"Views", Int64.Type}, {"Likes", Int64.Type}}, "en-US"),
    #"Added Custom" = Table.AddColumn(#"Changed Type with Locale", "trimmedVideo", each ExcelTrim([Video]))
in
    #"Added Custom"

getLinks

let
    Source = Table.FromColumns({Lines.FromBinary(Web.Contents("https://kworb.net/youtube/"))}),
    #"Filtered Rows" = Table.SelectRows(Source, each Text.Contains([Column1], "href")),
    #"Filtered Rows1" = Table.SelectRows(#"Filtered Rows", each Text.Contains([Column1], "<div><a href=")),
    #"Added Custom" = Table.AddColumn(#"Filtered Rows1", "Link", each Text.BetweenDelimiters([Column1],"<a href=""","</a>")),
    #"Split Column by Delimiter" = Table.SplitColumn(#"Added Custom", "Link", Splitter.SplitTextByEachDelimiter({""">"}, QuoteStyle.None, false), {"Link.1", "Link.2"}),
    #"Changed Type" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Link.1", type text}, {"Link.2", type text}}),
    #"Removed Columns" = Table.RemoveColumns(#"Changed Type",{"Column1"}),
    #"Added Custom1" = Table.AddColumn(#"Removed Columns", "trimmedVideo", each ExcelTrim([Link.2])),
    #"Added Custom2" = Table.AddColumn(#"Added Custom1", "normLinks", each if not Text.StartsWith([Link.1],"http") then 
    "https://kworb.net/youtube/" & [Link.1] else 
    [Link.1])
in
    #"Added Custom2"

Merge1

Returns the links in a separate column from the Videos

let
    Source = Table.NestedJoin(#"Table 0", {"trimmedVideo"}, getLinks, {"trimmedVideo"}, "getLinks", JoinKind.LeftOuter),
    #"Added Custom" = Table.AddColumn(Source, "Links", each Table.Column([getLinks],"normLinks")),
    #"Extracted Values" = Table.TransformColumns(#"Added Custom", {"Links", each Text.Combine(List.Transform(_, Text.From)), type text}),
    #"Removed Columns" = Table.RemoveColumns(#"Extracted Values",{"trimmedVideo", "getLinks"})
in
    #"Removed Columns"

Alternatively you can use:

Merge1 (2)

Returns a HYPERLINK formula to the table which provides a clickable link with a friendly name.

let
    Source = Table.NestedJoin(#"Table 0", {"trimmedVideo"}, getLinks, {"trimmedVideo"}, "getLinks", JoinKind.LeftOuter),
    #"Added Custom" = Table.AddColumn(Source, "Links", each Table.Column([getLinks],"normLinks")),
    #"Replaced Value" = Table.ReplaceValue(#"Added Custom","""","""""",Replacer.ReplaceText,{"Video"}),
    #"Extracted Values" = Table.TransformColumns(#"Replaced Value", {"Links", each Text.Combine(List.Transform(_, Text.From)), type text}),
    #"Removed Columns" = Table.RemoveColumns(#"Extracted Values",{"trimmedVideo", "getLinks"}),
    #"Added Custom1" = Table.AddColumn(#"Removed Columns", "Linked Videos", each "=HYPERLINK(""" & [Links] & """," & """" &[Video] & """)"),
    #"Changed Type" = Table.TransformColumnTypes(#"Added Custom1",{{"Linked Videos", type text}}),
    #"Removed Columns1" = Table.RemoveColumns(#"Changed Type",{"Video", "Links"}),
    #"Reordered Columns" = Table.ReorderColumns(#"Removed Columns1",{"", "2", "Linked Videos", "Views", "Likes"})
in
    #"Reordered Columns"

If you use Merge1 (2) to get the hyperlinks, after saving, you will need to select the Linked Video column, and do a Find/Replace or = with = in order to turn the formula from a text string into a formula. If you refresh the query, you will need to repeat this process.

You may also want to format the Views and Likes columns to show your thousands separators.

Here is an example using `Merge1 (2) with the hyperlinks and my thousands separators.

enter image description here

Community
  • 1
  • 1
Ron Rosenfeld
  • 53,870
  • 7
  • 28
  • 60
  • I did this with Locale but I didn't notice the Data Type in this window and used the context menu to pick Whole Number and it didn't work. Now, when you look at the result, it's nice but pretty useless without the hyperlinks. With the hyperlinks I will create 2 columns with hyperlinks, one for kworb (stats) and one for youtube (the video). Is there a way to get the hyperlinks? The old Web Query did get them. I mean, it's ridiculous to use the old Web Query which is slow when retrieving the full html and then the new one just for the 2 columns containing the numbers. Isn't it? – VBasic2008 Mar 28 '20 at 21:21
  • @VBasic2008 Did you go back and try using the Locale window to set both the data type and country? And did it work? I did not realize you also wanted the hyperlink. It is doable but complicated in Power Query. Another option, when using the Legacy Wizard, would be to change your Windows Regional Settings to the `English - US` before doing the import, and then change them back afterwards. This requires windows API calls (or registry changes) and there are examples of how to do that on this forum. – Ron Rosenfeld Mar 29 '20 at 01:48
  • @VBasic2008 BTW, what are your native thousands and decimal separator (what country)? If I change things, I just get a text string, with no dropped digits, on import. – Ron Rosenfeld Mar 29 '20 at 01:48
  • My thousands separator is a period ("."). My conclusion so far is, that it is Excel related, since when I copy one of those critical numbers and paste them in Excel it automatically recognizes them as a decimal number and truncates the 'trailing' zeros. On the other hand if I paste them into Notepad or Word, the output is as is. So there might be a setting in Excel (maybe Error Checking) that could be the reason behind all of this. This case is the same as if you would paste some data with a period as the thousands separator into your Excel and it would recognize it as a decimal number. – VBasic2008 Mar 29 '20 at 09:46
  • @VBasic2008 It's not an Excel setting. It's your Windows Regional Setting that determines how the values from the legacy Web import wizard are interpreted. And there's no native method of which I am aware, in Power Query, to retain the hyperlinks during import. AFAIK you have to retrieve the hyperlinks separately, and then merge them with the table. Folks have written a variety of methods to do that, but I'm not advanced enough in PQ to be able to apply them to your specific problem. – Ron Rosenfeld Mar 29 '20 at 11:16
  • @VBasic2008 I have added code which enables the hyperlinks, either in a separate column, or as a single column with a "friendly" name. Maybe someone has a simpler method, but this was the best I could do. – Ron Rosenfeld Mar 29 '20 at 13:28
  • Impressive, but rather complicated. I went through the whole thing and it works, but it's slow like LegacyQuery, which I got to work. I changed the 'separator' settings with VBA before the query and reverted them after the query. The same way I got QHarr's version to work.Here's a link to my [Workbook]https://drive.google.com/open?id=1_PjEQ_tGB0ESq1yKkAYAluwgmCUcqUdq so far. I'm definitely interested in Power Query now, is that some kind of SQL and can you provide a link where I could get familiar with it. Thanks very much. Next I'll have to improve QHarr's version with another question. – VBasic2008 Mar 29 '20 at 20:57
  • @VBasic2008 You could certainly merge the three queries into one, but it would still be complicated. Maybe someday they'll add the functionality to native MCode. So far as learning, I suggest you read the introduction at MSDN. Then look for tutorials on Lists, Records and Tables. And then for a tutorial on the `each` keyword. The MS documentation is not great, so a lot will be from practice, asking questions, and reading what others have blogged about. – Ron Rosenfeld Mar 30 '20 at 00:31