My code -- which opens a csv file containing a list of links each containing a resume, prints the text in the resumes, the basic sentiment and the corresponding link -- only works sometimes.
Originally I thought the code didn't work but in fact it does and I have been successful in getting a print out a couple times so far, but it is rare.
The file 'test.csv' contains just three links and is very small. The notebook does not seem to be thinking.
I'm using a Jupyter notebook in Graph Lab Create and I have been successful in getting a print out on both my Microsoft installation and my OSX installation. Right now I am on my Mac and when I press Shift+Enter nothing happens. I get a momentary [*] before turning to [num] without any result.
I have tried splitting into three separate cells and executing at once. It has worked only after code is split into three cells.
Has anyone had this problem before? Any advice could be greatly appreciated.
Python 2.7
import urllib
from bs4 import BeautifulSoup
from textblob import TextBlob
all_links = open('test.csv', 'r')
for links in all_links:
html = urllib.urlopen(links).read()
soup = BeautifulSoup(html, "lxml")
for script in soup(["script", "style"]):
script.extract()
text = soup.get_text()
lines = (line.strip() for line in text.splitlines())
chunks = (phrase.strip() for line in lines for phrase in line.split(" "))
text = '\n'.join(chunk for chunk in chunks if chunk)
words = text.encode('utf-8')
sent = words.decode('utf-8')
dec_sent = TextBlob(sent)
print links, words, dec_sent.sentiment.polarity