I am trying to recursively crawl a Wikipedia url for all English article links. I want to perform a depth first traversal of n but for some reason my code is not recurring for every pass. Any idea why?
def crawler(url, depth):
if depth == 0:
return None
links = bs.find("div",{"id" : "bodyContent"}).findAll("a" , href=re.compile("(/wiki/)+([A-Za-z0-9_:()])+"))
print ("Level ",depth," ",url)
for link in links:
if ':' not in link['href']:
crawler("https://en.wikipedia.org"+link['href'], depth - 1)
This is the call to the crawler
url = "https://en.wikipedia.org/wiki/Harry_Potter"
html = urlopen(url)
bs = BeautifulSoup(html, "html.parser")
crawler(url,3)