0

My code -- which opens a csv file containing a list of links each containing a resume, prints the text in the resumes, the basic sentiment and the corresponding link -- only works sometimes.

Originally I thought the code didn't work but in fact it does and I have been successful in getting a print out a couple times so far, but it is rare.

The file 'test.csv' contains just three links and is very small. The notebook does not seem to be thinking.

I'm using a Jupyter notebook in Graph Lab Create and I have been successful in getting a print out on both my Microsoft installation and my OSX installation. Right now I am on my Mac and when I press Shift+Enter nothing happens. I get a momentary [*] before turning to [num] without any result.

I have tried splitting into three separate cells and executing at once. It has worked only after code is split into three cells.

Has anyone had this problem before? Any advice could be greatly appreciated.

Python 2.7

import urllib
from bs4 import BeautifulSoup
from textblob import TextBlob

all_links = open('test.csv', 'r')

for links in all_links:
    html = urllib.urlopen(links).read()
    soup = BeautifulSoup(html, "lxml")

    for script in soup(["script", "style"]):
        script.extract()

        text = soup.get_text()

        lines = (line.strip() for line in text.splitlines())
        chunks = (phrase.strip() for line in lines for phrase in line.split("  "))
        text = '\n'.join(chunk for chunk in chunks if chunk)

        words = text.encode('utf-8')
        sent = words.decode('utf-8')

        dec_sent = TextBlob(sent)

        print links, words, dec_sent.sentiment.polarity
Ty Batten
  • 31
  • 1
  • 5

1 Answers1

0

your problem is likely due to the fact that you define words, dec_sent inside the for loop and links not at all, but re-use from previous cells...

in case the html is empty, the loop isn't running, so you don't have those vars defined and should get an error (or are printing the values of previous runs).

restart your jupyter kernel and re-exec all cells, maybe also print locals() to see defined variables.

Jörn Hees
  • 3,338
  • 22
  • 44
  • Thank you so much for your reply. If I restart the kernel and run-all I get the same result. it [*]s for a couple seconds and prints nothing. If I add for links in all_links: print links below the open line still nothing prints. Here I would assume I would at least get the contents of the csv printed for me. But nothing. – Ty Batten Apr 08 '17 at 14:19
  • Also, my apologies I had copied the above code over wrong. I have fixed it to the suit the actual way I have it running now. – Ty Batten Apr 08 '17 at 15:01