When I run the Python code
import newspaper
print(len(newspaper.build('http://cnn.com', memoize_articles=False).articles))
exit()
in Python 3 I get the output 897 (i.e. newspaper3k found 897 pages considered articles on the domain http://cnn.com), but when I run
import newspaper
print(len(newspaper.build('http://www.cnn.com', memoize_articles=False).articles))
exit()
(i.e., with an additional www.
; nothing else has changed) I only get 895. These numbers are consistent when I switch forth and back between these two URLs. Is the www.
actually significant in a URL? If so, why does the article count become so similar with these two URLs when using the newspaper3k library? Otherwise, why isn't the article count exactly the same?