I want to make a web scraper that will automatically take the translation of a word from Google Translate, in my console environment using my python script.
I saw that HTML code which I got from python Requests
module, is very different from what it is on the website, you can see the differences.
After some researches about this subject, I learned from this answer that Google has some security features that won't let me have access to its HTML contents using my scripts.
But I have a chrome extension ImTranslator
, which can give me translate of whatever word I selected from a web page, directly from Google Translate.
So, how this extension can do this?! Why I can't have a script that will do this for me?
I also tried using urllib
for making requests and sending headers with my request.
Also, this is my code:
First, using urllib
module:
headers = {
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
'Accept-Encoding': 'none',
'Accept-Language': 'en-US,en;q=0.8',
'Connection': 'keep-alive'
}
url = 'https://translate.google.com/#view=home&op=translate&sl=en&tl=fa&text=space'
req = urllib.request.Request(url, None, headers)
respone = urllib.request.urlopen(req)
Second, using Requests
:
url = 'https://translate.google.com/#view=home&op=translate&sl=en&tl=fa&text=space'
res = requests.get(url)