White space and selectors

Question

Try to use a selector on scrapy shell to extract information from a web page and didn't work proprely. I believe that it happened because exist white space into class name. Any idea what's going wrong?

I tried different syntaxes like:

response.xpath('//p[@class="text-nnowrap hidden-xs"]').getall()

response.xpath('//p[@class="text-nnowrap hidden-xs"]/text()').get()

# what I type into my scrapy shell
response.css('div.offer-item-details').xpath('//p[@class="text-nowrap hidden-xs"]/text()').get()

# html code that I need to extract:
<p class="text-nowrap hidden-xs">Apartamento para arrendar: Olivais, Lisboa</p>

expected result: Apartamento para arrendar: Olivais, Lisboa

actual result: []

There isn’t really a whitespace in the classname. In html you can give multiple classes to a html element by seperating them with a whitespace in the class attribute. This means the
had two classes: text-nowrap and hidden-xs. That might help you further debugging the problem. A quick search by myself led me to the following solution, didn't test it myself: https://stackoverflow.com/a/3881148/6511985 — Stephan Schrijver, May 16 '19 at 17:20
first check if page doesn't use JavaScript to add elements to HTML. Scrapy can't run JavaScript and you may have different HTML than you expect. — furas, May 16 '19 at 17:20
Thanks @StephanSchrijver for your help. That's the point: classname doesn't have white space. Now I need to now how to use 'response.css()' selector to extract classname with whitespace in it. Do my research about. Thanks! — Elsior Moreira Alves Junior, May 18 '19 at 07:55

score 2 · Answer 1 · answered May 16 '19 at 17:21

The whitespace in the class section means that there are multiple classes, the "text-nnowrap" class and the "hidden-xs" class. In order to select by xpath for multiple classes, you can use the following format:

"//element[contains(@class, 'class1') and contains(@class, 'class2')]"

(grabbed this from How to get html elements with multiple css classes)

So in your example, I believe this would work.

response.xpath("//p[contains(@class, 'text-nnowrap') and contains(@class, 'hidden-xs')]").getall()

score 1 · Accepted Answer · answered May 16 '19 at 21:12

1

For this cases I prefer using css selectors because of its minimalistic syntax:
response.css("p.text-nowrap.hidden-xs::text")

Also google chrome developer tools displays css selectors when you observing html code
This makes scraper development much easier

answered May 16 '19 at 21:12

Georgiy

3,158
1
6
18

Perfect @Georgiy. That's the answer that I'm loooking for. Thanks! – Elsior Moreira Alves Junior May 18 '19 at 07:57

White space and selectors

2 Answers2