4

My first write to the file needs to overwrite it, then my next ones need to append to it. But There is no way to know what write will be first. My writes are in conditional statements. Here is what I have:

class MyHTMLParser(HTMLParser):
    def __init__(self):
        HTMLParser.__init__(self)
        self.strict = False
        self.indent = " "
        self.pos = 0
        self.output_file = 'output_sass.txt'

    def handle_starttag(self, tag, attrs):
        if attrs != []:
            for attr in attrs:
                if ('id' in attr):
                    id = attr.index('id')
                    with open(self.output_file, 'w') as the_file:
                        the_file.writelines(self.indent * self.getpos()[1] + '#' + attr[id+1] + ' {' +'\n')
##                    print (self.indent * self.getpos()[1] + "#" + attr[id+1] + " {")
                    self.pos = self.getpos()[1]
                    break
                elif ('class' in attr):
                    clas = attr.index('class')
                    with open(self.output_file, 'w') as the_file:
                        the_file.writelines(self.indent * self.getpos()[1] + "." + attr[clas+1] + " {"+'\n')
##                    print (self.indent * self.getpos()[1] + "." + attr[clas+1] + " {")
                    self.pos = self.getpos()[1]
                    break
                else:
                    with open(self.output_file, 'w') as the_file:
                        the_file.writelines(self.indent * self.getpos()[1] + tag + " {"+'\n')
##                    print (self.indent * self.getpos()[1] + tag + " {")
                    self.pos = self.getpos()[1]
                    break
        else:
           with open(self.output_file, 'w') as the_file:
                the_file.writelines(self.indent * self.getpos()[1] + tag + " {"+'\n')
##            print (self.indent * self.getpos()[1] + tag + " {")
                self.pos = self.getpos()[1]

    def handle_endtag(self, tag):
        with open(self.output_file, 'w') as the_file:
            the_file.writelines(self.indent * self.pos + "}"+'\n')
##        print(self.indent * self.pos + "}")
user3164083
  • 1,053
  • 6
  • 18
  • 35
  • Then use some variable `first_write = True` and check it in all places. Then change it to `False` – furas Jul 12 '14 at 21:13
  • 1
    Or put data on some list and write only once at the end. – furas Jul 12 '14 at 21:14
  • Of course you can open file for writing once at the beginning (deleting previous file) and close it at the end. – furas Jul 12 '14 at 21:15
  • @furas Thanks, your list idea would work. How would you open once and then close at the end? Cheers. – user3164083 Jul 12 '14 at 21:18
  • in `__init__` do `self.the_file = open(self.output_file, 'w')` and you have open file and you can access it in all class. I don't only know when will be end of program to close file `self.the_file.close()`. Maybe `HTMLParser` has some function called at the end of data. – furas Jul 12 '14 at 21:20
  • See [HTMLParser.close()](https://docs.python.org/2/library/htmlparser.html#HTMLParser.HTMLParser.close) - it seems good place to close file. You will have to overwrite it and probably call `close()` from oryginal class `HTMLParser`. – furas Jul 12 '14 at 21:24
  • @furas I tried that earlier but never closed the file, but it never wrote anything to the file. Do you know what syntax to use to write to the file in this scenario? – user3164083 Jul 12 '14 at 21:28
  • Do you have some data for test - I do example but I have to test it. – furas Jul 12 '14 at 21:31
  • @Furas It works now. It must have been because I didn't close the file when I tried it earlier. Thank you. – user3164083 Jul 12 '14 at 21:35
  • So I add my comments as answer and you can mark it as accepted. – furas Jul 12 '14 at 21:39

2 Answers2

2

Add a class attribute that holds the changing flag:

import itertools

class MyHTMLParser(HTMLParser):
    def __init__(self, ...):
        ...
        self.modes = itertools.chain('w', itertools.cycle('a'))

    @property
    def mode(self):
        return next(self.modes)

    def handle_starttag:
        ...
        with open(filepath, self.mode) as the_file:  # self.mode is 'w' the first time and 'a' every time thereafter
            # write stuff
inspectorG4dget
  • 110,290
  • 27
  • 149
  • 241
  • Very interesting solution thank you. I managed to do it by opening the file in __init__() and closing the file at the end of the script. – user3164083 Jul 12 '14 at 21:38
1

use some variable first_write = True and check it in all places. Then change it to False.

Or put data on some list and write only once at the end.

Of course you can open file for writing once at the beginning (deleting previous file) and close it at the end.

In __init__ do self.the_file = open(self.output_file, 'w') and you have open file and you can access it in all class. I don't only know when will be end of program to close file self.the_file.close(). Maybe HTMLParser has some function called at the end of data.

See HTMLParser.close() - it seems good place to close file. You will have to overwrite it and probably call close() from oryginal class HTMLParser.

furas
  • 134,197
  • 12
  • 106
  • 148
  • Thanks. I did your option of opening the file in `__init__`. I gave the htmlparser this new function `def close_my_file(self): self.the_file.close()` and called it at the end of my script. Cheers – user3164083 Jul 12 '14 at 21:44
  • This is pretty straigtforward, but it adds a lot of branching. Also, the code becomes unmaintainable (think about writing in 20 different places and adding that check at each one) – inspectorG4dget Jul 12 '14 at 21:47
  • @inspectorG4dget Thanks. Should I have added a check? I didn't. Everytime I wrote to file I only did `self.the_file.writelines(self.indent * self.getpos()[1] + '#' + attr[id+1] + ' {' +'\n')` – user3164083 Jul 12 '14 at 21:50
  • If you are using the `first_write = True` method, then you should add checks every time you want to write something (unmaintainable design). If you are accumulating everything into a list and writing it out eventually, then that costs a lot of memory. You could open the file once in `__init__` and use the file handler to write everywhere, but that won't handle failures as gracefully as `with open(...) as ...:` would. Saving the order of file-open modes in an instance variable solves all of this, but costs memory; but with a little `itertools` magic, that memory footprint can be reduced. – inspectorG4dget Jul 12 '14 at 21:59