0

I am trying to get the text produced as a result of an external script in html with Python:

<td class="headerlast item" rowspan="2" colspan="1" id="Party_ATP021_1">
<div id="Party_ATP021mod"></div>

<script type="text/javascript">
var myTooltip = new YAHOO.widget.Tooltip("Party_ATP021tip", { 
context:"Party_ATP021_1", text:"Sozialdemokratische Partei Österreichs 
(Social Democratic Party of Austria)", showDelay:1000, 
autodismissdelay:5000,iframe:true, preventoverlap:true } );
</script>
SPÖ
</td>

I am trying to get: "SPÖ" with no success.

Thus far I was able to get the id, that is used in the script:

import requests
from bs4 import BeautifulSoup
import re
from fake_useragent import UserAgent
ua = UserAgent()

link = 'http://eed.nsd.uib.no/webview/velocity?v=2&mode=cube&cube=http%3A%2F%2F129.177.90.166%3A80%2Fobj%2FfCube%2FSIEP2004%21Display_C1&study=http%3A%2F%2F129.177.90.166%3A80%2Fobj%2FfStudy%2FSIEP2004%21Display'

headers ={'user-agent': str(ua.random)}
result_page = BeautifulSoup(requests.get(link, headers=headers, 
timeout=10).text, 'html.parser')

for td in result_page.find_all('td', {'class': 'headerlast item'})[1:]:
    print(td.get('id'))

Any help? Thanks a lot!

MCS
  • 1,071
  • 9
  • 23

1 Answers1

0

For your example data, you might use select with the css selector td.headerlast.item script and get the next_sibling after the script element.

html_doc = """
<td class="headerlast item" rowspan="2" colspan="1" id="Party_ATP021_1">
<div id="Party_ATP021mod"></div>

<script type="text/javascript">
var myTooltip = new YAHOO.widget.Tooltip("Party_ATP021tip", { 
context:"Party_ATP021_1", text:"Sozialdemokratische Partei Österreichs 
(Social Democratic Party of Austria)", showDelay:1000, 
autodismissdelay:5000,iframe:true, preventoverlap:true } );
</script>
SPÖ
</td>

"""

from bs4 import BeautifulSoup
result_page = BeautifulSoup(html_doc, 'html.parser')

for scrpt in result_page.select("td.headerlast.item script"):
    print(scrpt.next_sibling.strip())

That will result in:

SPÖ
The fourth bird
  • 154,723
  • 16
  • 55
  • 70
  • Hi there: I am experiencing the same issue, when trying to scrape part of the page originally hidden under 'Other'. The same code works only for the originally displayed columns despite the fact that the css selection. html_doc='http://eed.nsd.uib.no/webview/velocity?headers=Party&v=2+&virtualsubset=Percent_value&measure=common&virtualslice=Percent_value&layers=virtual&measuretype=4&study=http%3A%2F%2F129.177.90.166%3A80%2Fobj%2FfStudy%2FBGPA1994%21Display&cube=http%3A%2F%2F129.177.90.166%3A80%2Fobj%2FfCube%2FBGPA1994%21Display_C1&mode=cube&Partysubset=BG094+-+BGPP04%2CBG104+-+BG998 ' – MCS Jul 14 '18 at 11:29
  • There is no script tag under `Other` therefore it will not match anything. – The fourth bird Jul 14 '18 at 12:25