I'm using ghost package in my script for scraping a website. Since I have many pages to scrape, ghost is used many times, about 30 times per page and I might have hundreds of pages to scrape. I noticed, when running my script, that after about 25 pages I start getting Ghost::Qt::Qthread errors and even before that, it seems like ghost is not consistent meaning : basically ghost is used to extract a phone number from a simple page looking like this :
I'm suspecting that its about overloading memory, or something like that but I must admit that I'm new to Python and not skilled enough in programming (I come from Hardware world).
Has anyone encounter this type of problems ? I know ghost has a method called remove_page that should remove the "page" created but I have tried using it and I think its not working (or I'm missing something), here is a code where I try using this remove and after removing, I can still use the object:
from ghost import Ghost
gh=Ghost()
page, page_name = gh.create_page()
gh.remove_page(page)
After running this, and typing page
I would expect not to have any page defined. How do I release resources, delete the page, even delete the gh
object created ?