3

I'm getting the above error with the code below. The error occurs at the last line. Please excuse the subject matter, I'm just practicing my python skills. =)

from urllib.request import urlopen
from bs4 import BeautifulSoup
from pprint import pprint
from pickle import dump

moves = dict()
moves0 = set()
url = 'http://www.marriland.com/pokedex/1-bulbasaur'
print(url)
# Open url
with urlopen(url) as usock:
    # Get url data source
    data = usock.read().decode("latin-1")
    # Soupify
    soup = BeautifulSoup(data)
    # Find move tables
    for div_class1 in soup.find_all('div', {'class': 'listing-container listing-container-table'}):
        div_class2 = div_class1.find_all('div', {'class': 'listing-header'})
        if len(div_class2) > 1:
            header = div_class2[0].find_all(text=True)[1]
            # Take only moves from Level Up, TM / HM, and Tutor
            if header in ['Level Up', 'TM / HM', 'Tutor']:
                # Get rows
                for row in div_class1.find_all('tbody')[0].find_all('tr'):
                    # Get cells
                    cells = row.find_all('td')
                    # Get move name
                    move = cells[1].find_all(text=True)[0]
                    # If move is new
                    if not move in moves:
                        # Get type
                        typ = cells[2].find_all(text=True)[0]
                        # Get category
                        cat = cells[3].find_all(text=True)[0]
                        # Get power if not Status or Support
                        power = '--'
                        if cat != 'Status or Support':
                            try:
                                # not STAB
                                power = int(cells[4].find_all(text=True)[1].strip(' \t\r\n'))
                            except ValueError:
                                try:
                                    # STAB
                                    power = int(cells[4].find_all(text=True)[-2])
                                except ValueError:
                                    # Moves like Return, Frustration, etc.
                                    power = cells[4].find_all(text=True)[-2]
                        # Get accuracy
                        acc = cells[5].find_all(text=True)[0]
                        # Get pp
                        pp = cells[6].find_all(text=True)[0]
                        # Add move to dict
                        moves[move] = {'type': typ,
                                       'cat': cat,
                                       'power': power,
                                       'acc': acc,
                                       'pp': pp}
                    # Add move to pokemon's move set
                    moves0.add(move)

    pprint(moves)
    dump(moves, open('pkmn_moves.dump', 'wb'))

I have reduced the code as much as possible in order to produce the error. The fault may be simple, but I can't just find it. In the meantime, I made a workaround by setting the recursion limit to 10000.

zang3tsu
  • 131
  • 2
  • 1
    For infinite recursion we need to see the stack trace. – ApproachingDarknessFish Jan 28 '13 at 06:32
  • @ValekHalfHeart How do I get the stack trace? – zang3tsu Jan 28 '13 at 07:21
  • I managed to simplify the code more, and I found out what was the cause. It was the variable `move`. It's a `NavigableString` from `BeautifulSoup`. Typecasting it as `string` solved my problems (using `str()`). I'm not that well-versed with using BeautifulSoup so I'm quite surprised with these results. Anyway, it's definitely a lesson learned. – zang3tsu Jan 28 '13 at 14:03

1 Answers1

10

Just want to contribute an answer for anyone else who may have this issue. Specifically, I was having it with caching BeautifulSoup objects in a Django session from a remote API.

The short answer is the pickling BeautifulSoup nodes is not supported. I instead opted to store the original string data in my object and have an accessor method that parsed it on the fly, so that only the original string data is pickled.

Dan
  • 1,925
  • 3
  • 22
  • 28