I have a list (nearly 500) of RSS/ATOM feeds urls to parse and fetch the links.
I am using python feedparser libary to parse the url. To parse the list of urls parallely, I thought of using threading library in python.
My code looks something like this
import threading
import feedparser
class PullFeeds:
def _init__(self):
self.data = open('urls.txt', 'r')
def pullfeed(self):
threads = []
for url in self.data:
t = RssParser(url)
threads.append(t)
for thread in threads:
thread.start()
for thread in threads:
thread.join()
class RssParser(threading.Thread):
def __init__(self, url):
threading.Thread.__init__(self)
self.url = url
def run(self):
print "Starting: ", self.name
rss_data = feedparser.parse(self.url)
for entry in rss_data.get('entries'):
print entry.get('link')
print "Exiting: ", self.name
pf = PullFeeds()
pf.pullfeed()
The problem is, an empty list is returned from Feedparser as a result when I run this script. But without threading feedparser prints out the list of links parsed from the supplied URL.
How do i fix this?