I want to check if a given website contains robot.txt, read all the content of that file and print it. Maybe also add the content to a dictionary would be very good.
I've tried playing with the robotparser
module but can't figure out how to do it.
I would like to use only modules that come with the standard Python 2.7 package.
I did as @Stefano Sanfilippo suggested:
from urllib.request import urlopen
returned
Traceback (most recent call last):
File "<pyshell#1>", line 1, in <module>
from urllib.request import urlopen
ImportError: No module named request
So I tried:
import urllib2
from urllib2 import Request
from urllib2 import urlopen
with urlopen("https://www.google.com/robots.txt") as stream:
print(stream.read().decode("utf-8"))
but got:
Traceback (most recent call last):
File "", line 1, in with urlopen("https://www.google.com/robots.txt") as stream: AttributeError: addinfourl instance has no attribute 'exit'
From bugs.python.org it seems that's something not supported in 2.7 version. As a matter of fact the code works fine with Python 3 Any idea how to work this around?