2

How can I get to the "phone" and "fax" using python in combination with css selector. I did select the "name" but in case of "phone" and "fax" I got stuck. Any help on this will be highly appreciated. Thanks in advance.

I tried with:

name = div.contact-details p     #It works
phone = div.contact-details p    #Can't think beyond
fax = div.contact-details p      #Can't think beyond

Elements in which items are:

<div class="contact-details block dark">
<h3>Contact Details</h3><p>Company Name: PIMS Group Pty Ltd<br>Phone: +61 7 
4969 3900<br>Fax: +61 7 4969 3999<br>Email: <a 
href="mailto:admin@pims.net.au">admin@pims.net.au</a><br>Web: <a 
target="_blank" href="http://www.pims.net.au">http://www.pims.net.au</a></p>
<h4>Address</h4><p>43 Evans Avenue<br>North Mackay<br>QLD<br>4740</p>
<h4>Contact</h4><p></p>
</div>
SIM
  • 21,997
  • 5
  • 37
  • 109
  • You have your whole data inside the `p` tag separated by `
    `. You could get the content of the `p` tag and parse it with regexes to get specific pieces of information
    – Andrew Che Jul 29 '17 at 19:52

2 Answers2

1

You can try to use below XPath expressions to get required data:

# For Fax
substring-after(//div[@class="contact-details block dark"]/p/text()[starts-with(., "Fax:")], "Fax: ")
# For Phone
substring-after(//div[@class="contact-details block dark"]/p/text()[starts-with(., "Phone:")], "Phone: ")
Andersson
  • 51,635
  • 17
  • 77
  • 129
  • Thanks sir Andersson for your answer. Your solution never fails. However, I can't make it work because of my incapability. If I did it right then you meant as a whole: xpath('//div[@class="contact-details block dark"]//p/text()[starts-with(., "Fax:")], "Phone: ")[0] .There is something wrong with the closing bracket in the expression. I hope you will take a look. Thanks. – SIM Jul 30 '17 at 09:04
  • No, I mean something like `html.xpath('substring-after(//div/p/text()[starts-with(., "Fax:")], "Fax: ")')` – Andersson Jul 30 '17 at 09:07
  • No way. The best solution I've ever come across. You did the splitting with this expression as well. Now I get just the number. One last thing to urge sir-- how would this expression look like if i wanna get to the "Address" because there is no flag there. Thanks in advance. – SIM Jul 30 '17 at 09:33
  • I think something like this should work `//h4[.="Address"]/following-sibling::p[1]/text()` – Andersson Jul 30 '17 at 09:35
  • Before this moment, i thought that i knew a little about xpath but you cleared my confusion. I'm damn sure that I know nothing about it. thanks again sir. – SIM Jul 30 '17 at 09:39
  • If you have time, please take a look at this sir, "https://stackoverflow.com/questions/45398658/trouble-clicking-on-the-button-for-the-next-page?noredirect=1#comment77757942_45398658" – SIM Jul 30 '17 at 10:58
0

see: - Get the inner HTML of a element in lxml

since the key values are unstructured, this will not be reliable, but it might be possible to do a

for x in inner_html.split('<br>'):
    if ':' in x:
        yield x.split(':')[0], x.split(':')[1]
    else:
        yield 'unknown', x

or something similar, but then you'll have to add some sort of logic to order the key values. i'm not sure if regexes are appropriate, the logic will be brittle since there is no guarantee on the structure of the data, but some hacks might work here.

To give it a bit more structure you might be able to use xpath selection like:

//div.contact-details/descendant-or-self::h4[text()='Address']//p
jmunsch
  • 22,771
  • 11
  • 93
  • 114
  • Thanks jmunsch, for your answer. I didn't intend to look for any alternatives other than the selector. The value is not important for me. However, I wanted to know the process to get to the results using selector. I've already located the phone number, fax etc using xpath. For phone this expression suffices : ("//div[contains(@class,'contact-details')]//p/text()")[1] .One more thing- your xpath returns error. – SIM Jul 29 '17 at 20:35
  • 1
    @SMth80 no problem. I didn't test it. It was more to give you the idea of how to get it to work. – jmunsch Jul 30 '17 at 06:30