I'm coping with some html parsing, and I'm having quite a hard time defining a way to address the information being extracted.
For example, consider a page like this http://www.the-numbers.com/movies/1999/FIGHT.php. I want to address every content, like The Numbers Rating
, Rotten Tomatoes
, Production Budget
, Theatrical Release
, and others, so that I'm to store the value each "key" may assume.
The process of extraction is solved for me, what I'm not sure is about a proper way to store these contents. As I said, they work like "keys", so a dictionary
is quite a direct answer. Still I'm tempted by adding a member for each of these "keys" in the class I'm building.
The question is which approach will work out better, considering code writing, during the access of these contents, and if are those the best approaches on this is issue.
I would have, for the first case, something like:
class Data:
def __init__(self):
self.data = dict()
def adding_data(self):
self.data["key1"] = (val1, val2)
self.data["key2"] = val3
self.data["key3"] = [val4, val5, val6, ...]
And for the second one:
class Data:
def adding_data(self):
self.key1 = (val1, val2)
self.key2 = val3
self.key3 = [val4, val5, val6, ...]
The reason why I'm considering this is that I'm using BeautifulSoup
API, and I'm very in with the way they do address each tag on the resulting "soup".
soup = BeautifulSoup(data)
soup.div
soup.h2
soup.b
Which way do you think is more user-friendly? Is there any better way to do this?