extracting text using multiple urls

Question

when I can extract the content from list of URLs and then store the contents in text files , the problem is it is my python code is reading only last url link from the text file and store only those contents. Here I am using the goose extraction tool to pull some text from URLs

Can Help me out with this (any problem with for loop here ??)

class FetchUrl(Thread):
    def __init__(self, url, name):
      Thread.__init__(self)
      self.name = name
      self.url = url

    def run(self):
      config = Configuration()
      config.browser_user_agent = 'Mozilla 5.0'
      config.http_timeout = 20 
      g = Goose(config)
      fname = os.path.basename(self.name)
      with open(fname +".txt","w+") as f_handler:
           for tmp in url:
              article = g.extract(url=tmp)
              contents = article.cleaned_text
              f_handler.write(contents)
       msg = "%s was finished downloaded with this link %s!" % (self.name, 
          self.url)
       print(msg)


def main(url):
   for item , url in enumerate(url):
     name = "Thread %s" % (item+1)
     fetch = FetchUrl(url, name)
     fetch.start()

if __name__ == "__main__":
   u_path = 'url_list/url.txt'
   url = []
   for line in open(u_path):
        line = line.strip()
        url.append(line)
        print(line)
main(url)

This code is heavily misindented, so if you're working with this exact code, it won't even run. Please post the actual code and format it properly. — ForceBru, Aug 03 '19 at 09:11
Why are you not using `self.url` instead of `url` in `for loop` of `run` method? — Tony Montana, Aug 03 '19 at 09:45

score 0 · Accepted Answer · answered Aug 03 '19 at 09:10

0

Your variable contents is being overwritten, that way when it exists the for tmp in url: loop, only contents of last url are in the contents variable. Try something like,

# open file in write mode
    # loop over urls
        # extract url contents
        # clean it
        # write to file

answered Aug 03 '19 at 09:10

Nauman Naeem

408
3
12

can you look at my edited code in question which i am following – Zeeshan Khan saif Aug 03 '19 at 09:23
Isn't a file write access locked when it is being accessed from one thread? – Nauman Naeem Aug 03 '19 at 09:35
As now the code is according to your suggestion, and what it does now is that it stores the only content of first url link – Zeeshan Khan saif Aug 03 '19 at 09:40
Take a look at the accepted [answer](https://stackoverflow.com/questions/11983938/python-appending-to-same-file-from-multiple-threads). – Nauman Naeem Aug 03 '19 at 09:45
My suggestion was for a normal case, you hadn't mentioned threads yet. – Nauman Naeem Aug 03 '19 at 09:46

extracting text using multiple urls

1 Answers1