XPath text() returns an extra blank value

Question

I'm scraping this site, specifically the content of the tables inside the div tags with class containing 'ranking-data'. So for the first td that would be: //div[contains(@class, 'ranking-data')]//tr[th//text()[contains(., 'TIN')]]/td[1]/text()"

This is working fine for all columns in all tables (with needed modifications) except for a cell in column 2 that contains an i tag: on Google Spreadsheets it adds an extra blank cell below the cell with the text itself. I've first tried to scrap it with: //div[contains(@class, 'ranking-data')]//tr[th//text()[contains(., 'TIN')]]/td[2]/text()

Then I've tried something like *[not(i[contains(@class,'info-circle')])]/text() after the td[2], and some other variants, but it doesn't work.

How can I avoid this i tag?

Yeah, that `td` works; but not the next one: `//div[contains(@class, 'ranking-data')]//tr[th//text()[contains(., 'TIN')]]/td[2]/text()`. Sorry, I should have explained myself better and/or posted this one. Let me edit it. — curropar, Nov 16 '22 at 18:12
The cell selected is the problem: https://i.imgur.com/iYQGyfv.png — curropar, Nov 16 '22 at 18:18

score 1 · Accepted Answer · answered Nov 16 '22 at 18:18

1

try:

=QUERY(IMPORTXML(A1, "//div[contains(@class, 'ranking-data')]//tr[th//text()[contains(., 'TIN')]]/td[2]/text()"), "where Col1 <>' '", )

answered Nov 16 '22 at 18:18

player0

124,011
12
67
124

1

That was it. I was hitting my head against a wall with the XPath, and I forgot to try things outside the `IMPORTXML`. Well, thanks!! – curropar Nov 16 '22 at 18:26

score 1 · Answer 2 · answered Nov 17 '22 at 08:49

Answer given by @player0 is working for my case, and since it was the first answer I won't remove the "accepted" mark from it; but I'm stubborn and I've find an alternative with just XPath (which may be useful for other cases). It was as simple as adding an [1] at the end of my first query:

//div[contains(@class, 'ranking-data')]//tr[th//text()[contains(., 'TIN')]]/td[2]/text()[1]

XPath text() returns an extra blank value

2 Answers2