I have the below HTML from a view:source of a webpage
<a target="_blank" rel="nofollow" href="http://www.facebook.com/014media?utm_source=Thalamus.co&utm_medium=AdVendorPage&utm_content=https://www.thalamus.co/buyers/014-media"><div class="icon--rounded icon"><svg xmlns="https://www.w3.org/2000/svg"><use xlink:href="/sprite.svg#facebook"></use></svg></div>
</a><a target="_blank" rel="nofollow" href="http://www.linkedin.com/company/014-media?utm_source=Thalamus.co&utm_medium=AdVendorPage&utm_content=https://www.thalamus.co/buyers/014-media"><div class="icon--rounded icon"><svg xmlns="https://www.w3.org/2000/svg"><use xlink:href="/sprite.svg#linkedin"></use></svg></div>
</a><a target="_blank" rel="nofollow" href="http://www.youtube.com/014media?utm_source=Thalamus.co&utm_medium=AdVendorPage&utm_content=https://www.thalamus.co/buyers/014-media"><div class="icon--rounded icon"><svg xmlns="https://www.w3.org/2000/svg"><use xlink:href="/sprite.svg#youtube"></use></svg></div>
</a><a target="_blank" rel="nofollow" href="http://www.twitter.com/014media?utm_source=Thalamus.co&utm_medium=AdVendorPage&utm_content=https://www.thalamus.co/buyers/014-media"><div class="icon--rounded icon"><svg xmlns="https://www.w3.org/2000/svg"><use xlink:href="/sprite.svg#twitter"></use></svg></div>
</a><a target="_blank" rel="nofollow" href="http://www.014media.com?utm_source=Thalamus.co&utm_medium=AdVendorPage&utm_content=https://www.thalamus.co/buyers/014-media"><div class="icon--rounded icon"><svg xmlns="https://www.w3.org/2000/svg"><use xlink:href="/sprite.svg#website"></use></svg></div>
</a>
using below xpath expression I am trying to get the LinkedIn URL parsed but couldn't able to do it.
from lxml import html, etree
asd = """<a target="_blank" rel="nofollow" href="http://www.facebook.com/014media?utm_source=Thalamus.co&utm_medium=AdVendorPage&utm_content=https://www.thalamus.co/buyers/014-media"><div class="icon--rounded icon"><svg xmlns="https://www.w3.org/2000/svg"><use xlink:href="/sprite.svg#facebook"></use></svg></div>
</a><a target="_blank" rel="nofollow" href="http://www.linkedin.com/company/014-media?utm_source=Thalamus.co&utm_medium=AdVendorPage&utm_content=https://www.thalamus.co/buyers/014-media"><div class="icon--rounded icon"><svg xmlns="https://www.w3.org/2000/svg"><use xlink:href="/sprite.svg#linkedin"></use></svg></div>
</a><a target="_blank" rel="nofollow" href="http://www.youtube.com/014media?utm_source=Thalamus.co&utm_medium=AdVendorPage&utm_content=https://www.thalamus.co/buyers/014-media"><div class="icon--rounded icon"><svg xmlns="https://www.w3.org/2000/svg"><use xlink:href="/sprite.svg#youtube"></use></svg></div>
</a><a target="_blank" rel="nofollow" href="http://www.twitter.com/014media?utm_source=Thalamus.co&utm_medium=AdVendorPage&utm_content=https://www.thalamus.co/buyers/014-media"><div class="icon--rounded icon"><svg xmlns="https://www.w3.org/2000/svg"><use xlink:href="/sprite.svg#twitter"></use></svg></div>
</a><a target="_blank" rel="nofollow" href="http://www.014media.com?utm_source=Thalamus.co&utm_medium=AdVendorPage&utm_content=https://www.thalamus.co/buyers/014-media"><div class="icon--rounded icon"><svg xmlns="https://www.w3.org/2000/svg"><use xlink:href="/sprite.svg#website"></use></svg></div>
</a>"""
html.fromstring(asd.replace("xlink:href","xlinkhref")).xpath('(//a//div//svg//use[contains(@xlinkhref,"linkedin")])//@href')
output is
[]
Due to lxml.etree.XPathEvalError: Undefined namespace prefix
errors, I had to replace the ":"
, but still couldn't understand where I am doing things wrong, Any suggestions highly appreciated.
Using re I able to parse what i need , but still couldn't find solution with lxml
[each.split('"')[0] for each in re.findall('<a target="_blank" rel="nofollow" href="(.+?)</a>',asd,re.DOTALL) if '/sprite.svg#linkedin' in each][0].split('?')[0]