Extracting author from the article

Question

Just as the title says, I've been working on crawling the article, all that's left is the author.

Below is my code, using pyquery to compile the paragraphs and author, with only the author returning blank

site of target: http://business.transworld.net/153984/news/surfrider-foundation-names-chad-nelsen-new-ceo/

def extract_text_pyquery(html):
    p = pq(html)
    article_whole = p.find(".entry")
    p_tag = article_whole('p')
    print len(p_tag)
    print p_tag
    for i in range (0, len(p_tag)):
        text = p_tag.eq(i).text()
        print text
    entire = p.find("#main")
    author = entire.find('a').filter('.author')
    print 'By:', author

score 0 · Answer 1 · answered Oct 01 '14 at 00:17

0

the class isn't author, the rel is; period selects a class. You should instead filter for '[rel="author"]', brackets let you filed onter bas non standard tags.

answered Oct 01 '14 at 00:17

ragingSloth

1,094
8
22

Thank you! Almost had it, I guess I should've been more specific in that I want to obtain the name without the tags/functions attached. Currently, it shows the line copied from the page source, then the name alone. I've entered it as you suggested, then added the "for i in range" and that was the result. – fsbinesh Oct 01 '14 at 06:01
that's going to be specific to pyquery, but there should be a way to access an individual tags value – ragingSloth Oct 01 '14 at 15:10

Extracting author from the article

1 Answers1