how to extract href from element using lxml cssselctor?

Question

def extract_page_data(html):
tree = lxml.html.fromstring(html)
item_sel = CSSSelector('.my-item')
text_sel = CSSSelector('.my-text-content')
time_sel = CSSSelector('.time')
author_sel = CSSSelector('.author-text')
a_tag = CSSSelector('.a')

    for item in item_sel(tree):
    yield {'href': a_tag(item)[0].text_content(),
           'my pagetext': text_sel(item)[0].text_content(),
           'time': time_sel(item)[0].text_content().strip(),
           'author': author_sel(item)[0].text_content()}

I want to extract href but I am not able to extract it using this code

Along with the solution that sir Andersson has already provided, you need to modify your selector call like `.cssselect()` not `.CSSSelector()`. — SIM, Feb 27 '18 at 18:52
Sorry, I misunderstood you. It seems you did things differently. — SIM, Feb 27 '18 at 19:27

Andersson · Accepted Answer · 2018-02-27T18:22:05.983

6

Try to extract @href as

'href': a_tag(item)[0].attrib['href']

or

'href': a_tag(item)[0].get('href')

As an option you can also use XPath

tree.xpath(".//a/@href")

edited Feb 27 '18 at 18:22

answered Feb 27 '18 at 18:16

Andersson

51,635
17
77
129

(item).xpath(".//a/@href")[0].strip() this worked Thank you sir :) – elrich bachman Feb 27 '18 at 21:46

how to extract href from element using lxml cssselctor?

1 Answers1