0

I am getting some strange Index error ONLY some of the time with the following codes:

# coding=utf-8
from bs4 import BeautifulSoup as bs
from urllib import urlopen
import json
import csv

link = 'https://familysearch.org/pal:/MM9.1.1/KH21-F11'
Soup = bs(urlopen(link).read())

# Process into Json, plus index error control
rawJ = Soup.find_all('script')
J = str(rawJ[10])
J1 = J.split('var person = ')
J2 = J1[1].rsplit('var record =')
J3 = J2[0].rsplit(';', 1)

JsonText = J3[0].decode('utf-8')
s = json.loads(JsonText)

# Declare
name = s["personBestName"]

About 2-4 out of 10 times (randomly) running this same script on the same link, I get the error as follow:

Traceback (most recent call last):
  File "C:\Users\User\Desktop\_test8.py", line 16, in <module>
    J2 = J1[1].rsplit('var record =')
IndexError: list index out of range
[Finished in 2.8s with exit code 1]
KubiK888
  • 4,377
  • 14
  • 61
  • 115
  • Clearly you found some ` – Martijn Pieters Aug 16 '15 at 00:33
  • No, I am doing it with the same link 10 times and I know 'var person = ' is in the extracted text. It is giving me this error while I know the code is working. – KubiK888 Aug 16 '15 at 00:35
  • The error only occurs because that `'var person = ' text is **not** there. The server is not obliged to serve you the exact same text each time. – Martijn Pieters Aug 16 '15 at 00:39
  • For all you know the server is giving you a *please stop requesting the page so often* message with enough ` – Martijn Pieters Aug 16 '15 at 00:41
  • OH, are you serious? I never knew that. So the same page can be coming from various versions of html? So maybe I can try to create error control by waiting and retrying loop? – KubiK888 Aug 16 '15 at 00:41
  • Inspect the response that has a problem first. See what is being returned, perhaps you can code around it some other way. – Martijn Pieters Aug 16 '15 at 00:43
  • Thanks for the suggestion, I did just that and figured the Soup object can be different at times and makes J = str(rawJ[10]) unstable, and there is actually no reason to find all – KubiK888 Aug 16 '15 at 02:38

0 Answers0