2

I have a bunch of urls that I want to download the text and do some further analysis. I am a python newbie. I have two problems: (1) I have a really weird type error; and (2) the results are not being written to the data frame. My code is as follows:

smallURL= ['http://www.walesonline.co.uk/business/business-news/more-70-jobs-created-bio-12836127','http://economictimes.indiatimes.com/articleshow/61006825.cms?utm_source=contentofinterest&utm_medium=text&utm_campaign=cppst','http://100seguro.com.ar/telefonica-pone-en-venta-su-aseguradora-antares-vida/','http://13wham.com/news/local/urmc-opens-newest-urgent-care-facility']

import pandas
import datetime


f = open('myfile', 'w')

#lista= ['http://www.walesonline.co.uk/business/business-news/more-70-jobs-created-bio-12836127','http://economictimes.indiatimes.com/articleshow/61006825.cms?utm_source=contentofinterest&utm_medium=text&utm_campaign=cppst','http://100seguro.com.ar/telefonica-pone-en-venta-su-aseguradora-antares-vida/','http://13wham.com/news/local/urmc-opens-newest-urgent-care-facility']

df = pandas.DataFrame(columns=('d', 'datetime', 'title', 'text','keywords', 'url'))

from newspaper import Article 

for index in range(len(smallURL)):

#url = "https://www.bloomberg.com/news/articles/2017-11-10/microsoft-and-google-turn-to-ai-to-catch-amazon-in-the-cloud"
    article = Article(smallURL[index])
#1 . Download the article
    #try:
    article.download()
    #f.write('article.title+\n')
    #except:
    #pass
#2. Parse the article
    try:
        article.parse()
        f.write('article.title+\n')
    except:
        pass
#Print article title
    #print(article.title)
    article.title
#3. Fetch Author Name(s)
    print(article.authors)
#4. Fetch Publication Date
    if article.publish_date is None:
        d = datetime.datetime.now().date()
    else:
        d = article.publish_date
#5. Print article text
    print(article.text)
#6. Natural Language Processing on Article to fetch Keywords
    #article.nlp()
    #Print Keywords
    print(article.keywords)
#7. Generate Summary of the article
    #print(article.url)
    print(article.url)
    df.loc[index]  = [d, datetime.datetime.now().date(), article.title, article.text,article.keywords,article.url]

My output includes:

[] http://100seguro.com.ar/telefonica-pone-en-venta-su-aseguradora-antares-vida/ Traceback (most recent call last):

File "", line 1, in runfile('C:/Users/theiman/Desktop/untitled7.py', wdir='C:/Users/theiman/Desktop')

File "C:\Users\theiman\AppData\Local\Continuum\anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 710, in runfile execfile(filename, namespace)

File "C:\Users\theiman\AppData\Local\Continuum\anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 101, in execfile exec(compile(f.read(), filename, 'exec'), namespace)

File "C:/Users/theiman/Desktop/untitled7.py", line 57, in df.loc[index] = [d, datetime.datetime.now().date(), article.title, article.text,article.keywords,article.url]

File "C:\Users\theiman\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexing.py", line 179, in setitem self._setitem_with_indexer(indexer, value)

File "C:\Users\theiman\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexing.py", line 425, in _setitem_with_indexer self.obj._data = self.obj.append(value)._data

File "C:\Users\theiman\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\frame.py", line 4533, in append other = other._convert(datetime=True, timedelta=True)

File "C:\Users\theiman\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\generic.py", line 3472, in _convert copy=copy)).finalize(self)

File "C:\Users\theiman\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\internals.py", line 3227, in convert return self.apply('convert', **kwargs)

File "C:\Users\theiman\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\internals.py", line 3091, in apply applied = getattr(b, f)(**kwargs)

File "C:\Users\theiman\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\internals.py", line 1892, in convert values = fn(values.ravel(), **fn_kwargs)

File "C:\Users\theiman\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\dtypes\cast.py", line 740, in soft_convert_objects values = lib.maybe_convert_objects(values, convert_datetime=datetime)

File "pandas/_libs/src\inference.pyx", line 1204, in pandas._libs.lib.maybe_convert_objects

TypeError: unhashable type: 'tzutc'

Any idea on what is going wrong and how I can fix it? Thank you!!

tom
  • 315
  • 1
  • 3
  • 10
  • What version of python are you using? Also, have you used anywhere the `dateutil` module? Maybe in your newspaper module? – kingJulian Jan 12 '18 at 23:47
  • Also, since it seems that the problem is rooted in the date processing part of your code may I suggest you add `print hash(d)` in the #4 part of your algorithm (if-else block)? If this fails to print at somepoint, then it'll mean that the`d` object in question is unhashable. – kingJulian Jan 13 '18 at 00:00
  • I am using the most recent version of Anaconda. – tom Jan 13 '18 at 01:40
  • Have you tried printing the output of the hash function? – kingJulian Jan 13 '18 at 01:42
  • I tried to print the output of the hash function and am not sure what I am looking for. This output looks relevant: File "C:\Users\theiman\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\dtypes\cast.py", line 740, in soft_convert_objects values = lib.maybe_convert_objects(values, convert_datetime=datetime) File "pandas/_libs/src\inference.pyx", line 1204, in pandas._libs.lib.maybe_convert_objects TypeError: unhashable type: 'tzutc' – tom Jan 13 '18 at 02:31
  • The [hash](https://docs.python.org/3.3/library/functions.html#hash) function outputs an integer number if the object is hashable. If you don't see a value being printed it means that this specific `d` instance is not hashable – kingJulian Jan 13 '18 at 11:48
  • If I get rid of the the d, it still doesn't add what material to my dataframe.. Do you have any idea on what I did wrong? Thank you! – tom Jan 13 '18 at 13:51
  • That's really strange...Have you tried using a break-point at step#7, when you add them to your dataframe? – kingJulian Jan 14 '18 at 13:23
  • I did and still got the same result. – tom Jan 14 '18 at 15:49

0 Answers0