I am writing the following code and am facing a frustrating problem, and I have not been able to solve it after being stuck with it for two days.
This is the simplified code:
def crawl_web(url, depth):
toCrawl = [url]
crawled = ['https://index.html']
i = 0
while i <= depth:
interim = []
for x in toCrawl:
if x not in toCrawl and x not in crawled and x not in interim:
print("NOT IN")
crawled.append(x)
toCrawl = interim
i += 1
return crawled
print(crawl_web("https://index.html", 1))
The outcome I expect should be just:
['https://index.html']
But somehow, the "if not in" does not work and keeps giving me this as the output:
['https://index.html','https://index.html']