With Beautiful Soup4, I'm trying to get some text that doesn't seem to be tagged. (I may be wrong, I'm not very capable with HTML)
I need to extract several values from the IMDb code of the page; the budget value and the latest worldwide gross value for a particular film. The length of the code varies between films so if there is a method using Beautiful Soup4 to extract these values regardless of the line number, that would be hugely helpful. This is the code:
<div id="tn15content">
<h5>Budget</h5>
$165,000,000 (estimated)<br/>
<br/>
from the source code of this page: IMDb Box Office page for Interstellar
I need that '$165,000,000' to be extracted so I can store it etc.
The Gross code is even more confusing:
<h5>Gross</h5>
$188,020,017 (USA) (<a href="/date/03-19/">19 March</a> <a href="/year/2015/">2015</a>)<br/>$187,991,439 (USA) (<a href="/date/03-15/">15 March</a> <a href="/year/2015/">2015</a>)<br/>$187,930,551 (USA) (<a href="/date/03-14/">14 March</a> <a href="/year/2015/">2015</a>)<br/>$187,918,949 (USA) (<a href="/date/03-11/">11 March</a> <a href="/year/2015/">2015</a>)<br/>$187,888,097 (USA) (<a href="/date/03-08/">8 March</a> <a href="/year/2015/">2015</a>)<br/>
All I need from this is the most recent (the Worldwide figures are further through a huge chunk of code which I decided to leave out due to spacing on here.
I know there was a similar problem on here solved, however I couldn't get the solution to work nor could I comment to ask the user providing the answer for help with my particular solution due to being new to the site. I was going to try and get IMDbPY to work, however I wasn't sure how to get it to install with WinPython.