2

I had made a script for scraping some data from a web site but it only runs for a few few pages and after that it will stop with this message "'NoneType' object has no attribute 'a'".Another error which appear sometimes is this:

File "scrappy3.py", line 31, in <module>
f.writerow(doc_details)
File "C:\python\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u015f' in 
position 251: character maps to <undefined>

Can You please give me an advice how to resolve those errors.This is my script:

import requests
import csv
from bs4 import BeautifulSoup
import re
import time

start_time = time.time()
page = 1
f = csv.writer(open("./doctors.csv", "w", newline=''))
while page <= 5153:
    url = "http://www.sfatulmedicului.ro/medici/n_s0_c0_h_s0_e0_h0_pagina" + str(page)
    data = requests.get(url)
    print ('scraping page ' + str(page))
    soup = BeautifulSoup(data.text,"html.parser")
    for liste in soup.find_all('li',{'class':'clearfix'}):
        doc_details = []
        url_doc = liste.find('a').get('href')
        for a in liste.find_all('a'):
            if a.has_attr('name'):
                doc_details.append(a['name'])   
        data2 = requests.get(url_doc)       
        soup = BeautifulSoup(data2.text,"html.parser")
        a_tel = soup.find('div',{'class':'contact_doc add_comment'}).a              
        tel_tag=a_tel['onclick']
        tel = tel_tag[tel_tag.find("$(this).html("):tel_tag.find(");")].lstrip("$(this).html(") 
        doc_details.append(tel)         
    f.writerow(doc_details)

    page += 1
print("--- %s seconds ---" % (time.time() - start_time))
florin
  • 51
  • 6
  • Which line are you getting this? Maybe you can post the whole error message with the stack trace. – RedX Dec 21 '17 at 12:58
  • `soup.find('div',{'class':'contact_doc add_comment'})` does not find anything, returns `None`, so the `.a` fails. – deceze Dec 21 '17 at 13:00
  • @deceze what is curious is that the program stops at a random page and i had checked on that page if that div is there and it is.So I guess i need to implement a function that retries to get that url and parse it again until it finds the div.Can you help me with my second error too? – florin Dec 21 '17 at 15:30

2 Answers2

3

Your error is here

   a_tel = soup.find('div',{'class':'contact_doc add_comment'}).a

soup.find is obviously not finding the div with the sought class. The return value is None and this by definition has no attributes.

You should check and decide if to continue with further queries in the loop or bail out. For example:

   div_contact = soup.find('div',{'class':'contact_doc add_comment'})
   if div_contact is None:
       continue

   a_tel = div_contact.a

You could also try with an try .. except block to cover more cases (like the div actually not having what you expect)

   div_contact = soup.find('div',{'class':'contact_doc add_comment'})
   try:
       a_tel = div_contact.a
   except AttributeError:
       continue

which is in theory more Pythonic. Your choice in any case.

Continuous and continued error checking is part of a program.

mementum
  • 3,153
  • 13
  • 20
  • what is curious is that the program stops at a random page and i had checked on that page if that div is there and it is.So I guess i need to implement a function that retries to get that url and parse it again until it finds the div.Can you help me with my second error too? – florin Dec 21 '17 at 15:29
  • You will probably run into the same error. Even if it visually seems to you to be the *same* div with the same class ... it's obviously not and that means you will have to account for extra cases. – mementum Dec 21 '17 at 22:10
  • With regards to encoding ... you have to decide what to do with the error, but you can have a look at this other answer for `write` which covers Unicode encoding when using `write`: https://stackoverflow.com/questions/22392377/error-writing-a-file-with-file-write-in-python-unicodeencodeerror – mementum Dec 21 '17 at 22:15
  • Regarding to encoding i had solved it by adding to the csv.writer encoding='utf-8' – florin Dec 22 '17 at 10:03
0
resp_find = soup.find('div',{'class':'contact_doc add_comment'})
if resp_find is not None:
    a_tel = resp_find.a

You can query if the response of soup.find() is a NoneType object, if not you can apply the .a

Or you ensure that the soup.find() method never give back a NoneType object, so you have to investigate why this method give a NoneType object

bierschi
  • 322
  • 3
  • 11
  • what is curious is that the program stops at a random page and i had checked on that page if that div is there and it is.So I guess i need to implement a function that retries to get that url and parse it again until it finds the div.Can you help me with my second error too? – florin Dec 21 '17 at 15:29