1

I'm using multiprocessing and ghost.py to crawl some data from the internet, but there are some errors:

2015-03-31T23:22:30 QT: QWaitCondition: Destroyed while threads are still waiting

This is some of my code:

    l.acquire()
    global ghost
    try:
        ghost = Ghost(wait_timeout=60)
        ghost.open(website) #download page
        ghost.wait_for_selector('#pagenum') #wait JS
        html = []
        #print u"\t\t the first page"
        html.append(ghost.content)
        pageSum = findPageSum(ghost.content)
        for i in xrange(pageSum-1): #crawl all pages
            #print u"\t\tthe"+ str(i+2) +"page"
            ghost.set_field_value('#pagenum', str(i+2)) 
            ghost.click('#page-go') 
            ghost.wait_for_text("<td>"+str(20*(i+1)+1)+"</td>")
            html.append(ghost.content)
        for i in html:
            souped(i)
        print  website, "\t\t OK!"
    except :
        pass
    l.release()

Other code:

    global _use_line
    q = Queue.Queue(0)
    for i in xrange(len(websitelist)):
        q.put((websitelist[i]))
    lock = Lock()

    while (not q.empty()):
        if (_use_line > 0):
            for i in range(_use_line):
                dl = q.get()
                _use_line -= 1
                print "_use_line: ", _use_line
                p = Process(target=download, args=(lock,dl))
                p.start()
        else:
            time.sleep(1)

ghost.py uses pyqt and pyside, and I think this issue is because ofsome local variable's error, but I don't know how to find it.

Sam Hanley
  • 4,707
  • 7
  • 35
  • 63
VictorLin
  • 59
  • 1
  • 3

0 Answers0