2

I am having this issue now, so I have a HTMLParser using HTMLParser library class like this

class MyHTMLParser(HTMLParser):
    temp = ''
    def handle_data(self, data):
        MyHTMLParser.temp += data

I need the temp variable because I need to save the data somewhere else so I can assess somewhere else.

My code use the class looks like this:

for val in enumerate(mylist):
    parser = HTMLParser()
    parser.feed(someHTMLHere)
    string = parser.temp.strip().split('\n')

The problem with is that this temp variable is storing whatever I stored it before, it doesn't reset even tho I am declaring a new instance of the parser every single time. How do I clear this variable??? I don't want it to save whatever's there from the previous loop

Graham Dumpleton
  • 57,726
  • 6
  • 119
  • 134
Anna
  • 443
  • 9
  • 29

3 Answers3

3

temp in your code is a class attribute, it will only initialize when first time python interpreter see this class, so temp = '' will only run for once.

So, move it to __init__ to make it as a object attribute is a good solution.

But, if you insist on to let it as a class attribute just as you said in comments:

Is there anyway to declare a global variable that can be used inside the class and elsewhere?

BTW, this could not be called as global variable, it is a class attribute.

Then you had to reset it by yourself. In your code, handle_data as a callback will be called by feed multiple times, so there is no chance to do it in handle_data, you had to do it out of class.

For your code, this could be something like follows with lineA, just FYI:

class MyHTMLParser(HTMLParser):
    temp = ''
    def handle_data(self, data):
        MyHTMLParser.temp += data

for val in enumerate(mylist):
    parser = MyHTMLParser()
    MyHTMLParser.temp = '' # lineA
    parser.feed(someHTMLHere)
    string = parser.temp.strip().split('\n') # lineB

See lineA, it will reset the temp to empty so every instance will not affect each other even you declare it at the start of the class as you needed.

But, pay attention, you should not replace lineA with parser.temp = '' or assign any value to parser.temp. This will make a new object attribute with the name temp, then parser.temp in lineB will no longer use the class attribute any more which then make your aim not reach.

atline
  • 28,355
  • 16
  • 77
  • 113
2

Like others have stated, the problem is that you are adding the data to the class variable instead of the instance variable. This is happening because of the line MyHTMLParser.temp += data

If you change it to self.temp += data it will change the data of each instance rather than storing it up in the class.

Here is a full working script:

from html.parser import HTMLParser

class MyHTMLParser(HTMLParser):
    temp = ""

    """Personally, I would go this route"""
    #def __init__(self):
    #   self.temp = ""
    #   super().__init__()
    """Don't forget the super() or it will break"""

    def handle_data(self, data):
        self.temp += data # <---Only real line change

"""TEST VARIABLES"""
someHTMLHere = '<html><head><title>Test</title></head>\
<body><h1>Parse me!</h1></body></html>'
mylist = range(5)
""""""""""""""""""

for val in enumerate(mylist):
    parser = MyHTMLParser() #Corrected typo from HTML to MyHTML
    parser.feed(someHTMLHere)
    string = parser.temp.strip().split('\n')

    print(string) #To Test each iteration
Mr. Kelsey
  • 508
  • 1
  • 4
  • 14
1

This happens because every time you call MyHTMLParser.temp you get a new variable ('').

What you need to do is add temp to the object itself. You do this in the constructor:

class MyHTMLParser(HTMLParser):
    def __init__(self):
        self.temp = ''

    def handle_data(self, data):
        self.temp += data

    # use a getter
    def get_temp(self):
        return self.temp

Now, the temp variable belongs to the object itself. And if you have several MyHTMLParser objects, they will each have their own temp variable.

Damien
  • 624
  • 7
  • 25
  • Is there anyway to declare a global variable that can be used inside the class and elsewhere? I was trying to declare a temp=' ' at the beginning of the file and then do a global temp inside the class, but that didn't work – Anna Oct 25 '18 at 23:05
  • You could pass a variable (`temp`) into your _object_ via the constructor that could then be shared by all and/or referenced elsewhere: ```parser = HTMLParser(temp)``` – Damien Jun 03 '19 at 05:45