Get content of a header tag searching by header tag name

Question

I'm scraping a page and I have to get the number of employees from this format:

<h5>Number of Employees</h5>
<p>
            20
</p>

I need to get the number "20" the problem is that this numbers isn't always in the same header, sometimes is in "h4" and there are more ''h5" headers, so I need to find the data that is contained in the header named: "Number of Employees" and the extract the number that is in the contained paragraph

This is the link of the page

http://www.bbb.org/chicago/business-reviews/paving-contractors/lester-s-material-service-inc-in-grayslake-il-72000434/

score 1 · Accepted Answer · answered Nov 29 '15 at 23:37

Well, the easiest way is to find an element that contains the "Number of Employees"-text, and then simply take the paragraph after that, assuming that the paragraph always follows right after.

Here's a quick and dirty piece of code that does this, and prints the number out:

parent = soup.find("div", id='business-additional-info-text')
for child in parent.children:
    if("Number of Employees" in child):
        print(child.findNext('p').contents[0].strip())

eLRuLL · Answer 2 · 2015-11-30T03:38:36.450

0

'normalize-space(//*[self::h4 or self::h5][contains(., "Number of Employees")]/following-sibling::p[1]/text())'

edited Nov 30 '15 at 03:38

answered Nov 29 '15 at 23:40

eLRuLL

18,488
9
73
99

1

is that the Xpath? I try it and get a quite long response, not just the number – Luis Ramon Ramirez Rodriguez Nov 29 '15 at 23:59
sorry I was just giving you an idea, please check the corrected xpath – eLRuLL Nov 30 '15 at 03:38

Get content of a header tag searching by header tag name

2 Answers2