I am currently going through the Web Scraping section of AutomateTheBoringStuff and trying to write a script that extracts translated words from Google Translate using BeautifulSoup4.
I inspected the html content of a page where 'Explanation' is the translated word:
<span id="result_box" class="short_text" lang="en">
<span class>Explanation</span>
</span>
Using BeautifulSoup4, I tried different selectors but nothing would return the translated word. Here are a few examples I tried, but they return no results at all:
soup.select('span[id="result_box"] > span')
soup.select('span span')
I even copied the selector directly from the Developer Tools, which gave me #result_box > span. This again returns no results.
Can someone explain to me how to use BeautifulSoup4 for my purpose? This is my first time using BeautifulSoup4 but I think I am using BeautifulSoup more or less correctly because the selector
soup.select('span[id="result_box"]')
gets me the outer span element**
[<span class="short_text" id="result_box"></span>]
**Not sure why the 'leng="en"' part is missing but I am fairly certain I have located the correct element regardless.
Here is the complete code:
import bs4, requests
url = 'https://translate.google.ca/#zh-CN/en/%E6%B2%BB%E5%85%B7'
res = requests.get(url)
res.raise_for_status
soup = bs4.BeautifulSoup(res.text, "html.parser")
translation = soup.select('#result_box span')
print(translation)
EDIT: If I save the Google Translate page as an offline html file and then make a soup object out of that html file, there would be no problem locating the element.
import bs4
file = open("Google Translate.html")
soup = bs4.BeautifulSoup(file, "html.parser")
translation = soup.select('#result_box span')
print(translation)