0

I wrote a script to scraping data from a site. It works when I run it with "python script.py" but when chmod +x and run directly from shell, it not work properly (not overwrite the output file)

here is my code (just try to use HTMLParser):

#!/usr/bin/env python


from HTMLParser import HTMLParser
import urllib
import codecs
import string

FILE_NAME = 'kq.txt'
LINK = 'http://kqxs.vn/'

class MyHTMLParser(HTMLParser):
    """Parser get content in first table in site"""
    def __init__(self):
        self.reset()
        self.fed = []
        self.found = False
        self.done = False
    def handle_starttag(self, tag, attrs):
        if tag == "table":
            self.found = True
        if tag == "/table":
            self.found = False

    def handle_endtag(self, tag):
        if tag == "table":
            self.done = True

    def handle_data(self, data):
        if self.found and not self.done:
            self.fed.append(data)

    def get_data(self):
        return self.fed

#read data from URL
response = urllib.urlopen(LINK)
#print response.headers['content-type']
html = response.read()

html = unicode(html, 'utf-8')

parser = MyHTMLParser()
parser.feed(html)

result = parser.get_data()
#write to file
fw = codecs.open(FILE_NAME, 'w', 'utf-8')
#line.strip() remove string contains only spaces
#[fw.write(line + '\n') for line in result if line.strip()]
fw.writelines(line + '\n' for line in result if line.strip())

fw.close()

print "Done! data printed to file %s" %(FILE_NAME)

Here is result from my shell

famihug@hvn:/home/famihug%~/bin/leecher.py; cat ~/bin/kq.txt                [0]
Done! data printed to file kq.txt
Giải đặc biệt
**92296** 


**(HERE I RUN IT FROM INSIDE VIM with !python %)**
famihug@hvn:/home/famihug/bin%vim leecher.py                                [0]

Done! data printed to file kq.txt

Press ENTER or type command to continue
zsh: suspended  vim leecher.py
famihug@hvn:/home/famihug/bin%cat kq.txt                                   [20]
Giải đặc biệt
****88705**** 

famihug@hvn:/home/famihug/bin%/usr/bin/env python                           [0]
Python 2.6.6 (r266:84292, Sep 15 2010, 15:52:39) 

famihug@hvn:/home/famihug/bin%python                                        [0]
Python 2.6.6 (r266:84292, Sep 15 2010, 15:52:39) 

The script still prints out last line Done! data printed to file kq.txt but it doesn't really do. If i remove the kq.txt file, it works well. And if I change a little in kq.txt (change a number), it work well too.

Can anyone explain why ? Thanks

HVNSweeting
  • 2,859
  • 2
  • 35
  • 30
  • 1
    That's one funny way of writing output to a file... using an unrelated list comprehension. Is there no more idiomatic way to iterate over every string in an iterator? – sarnold Jul 06 '12 at 02:08
  • 2
    @sarnold indeed - using a list comprehension for side effects is frowned upon. Better to write out the loop or do `fw.writelines(line + '\n' for line in result if line.split())`. – lvc Jul 06 '12 at 02:13
  • @lvc: Ah, yes, _that_ is more legible. (Even in a comment!) Thanks. – sarnold Jul 06 '12 at 02:16

2 Answers2

1

I solved my problem!

Because I use relative path with filename, so when I run:

famihug@hvn:/home/famihug%~/bin/leecher.py; cat ~/bin/kq.txt                [0]

it created a new kq.txt in /home/famihug/ , not in /home/famihug/bin/ That why I keep getting old result when cat ~/bin/kq.txt

Solution to this is: use a absolute path instead of relative path:

def fix_path(filename):
    filepath = os.path.realpath(__file__)
    path = os.path.dirname(filepath)
    fixed = os.path.join(path, filename)
    return fixed

fw = codecs.open(fix_path(FILE_NAME), 'w', 'utf-8')
HVNSweeting
  • 2,859
  • 2
  • 35
  • 30
-6

I have no clue, but try chmod 755 script_name This is probably due to not having permissions to overright file. But really, I have no clue, and I can't test it because I'm not on my computer, I'm using a friends computer. Will get back to question when I get my computer back.

Thor Correia
  • 1,528
  • 3
  • 23
  • 30