Just as the title says, I've been working on crawling the article, all that's left is the author.
Below is my code, using pyquery to compile the paragraphs and author, with only the author returning blank
site of target: http://business.transworld.net/153984/news/surfrider-foundation-names-chad-nelsen-new-ceo/
def extract_text_pyquery(html):
p = pq(html)
article_whole = p.find(".entry")
p_tag = article_whole('p')
print len(p_tag)
print p_tag
for i in range (0, len(p_tag)):
text = p_tag.eq(i).text()
print text
entire = p.find("#main")
author = entire.find('a').filter('.author')
print 'By:', author