2

I've been trying to get the html content of this page 'https://webcache.gmc-uk.org/gmclrmp_enu/start.swe' with Robobrowser. here is the code:

from robobrowser import RoboBrowser

landing_page_url = 'https://webcache.gmc-uk.org/gmclrmp_enu/start.swe'
browser = RoboBrowser(user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36')

browser.open(landing_page_url)

print(str(browser.parsed))

what I get from this is:

<html><body><form action="/gmclrmp_enu/start.swe" method="GET" name="RedirectForHost"><input name="SWECmd" type="hidden" value="Login"/>
<input name="SWEBHWND" type="hidden" value=""/>
<input name="_sn" type="hidden" value="THyz1L6X6J4NsE50rdbFd1FK6G5QhyWSLpaqOIT2U2IqKVWLjsZa7o3plt.FZpEoxV0LT2-0ttd3yjCGxI6DduqQ25v-mQZ12T61jKnDwJzm6kNqqH5UM3KVzeg1wr1U6.s1nxPge7cyz67EZiLw3IMIFn2ZnEaaK8AznGyFU1bjE7k7xj-KEJKs.QI9vpXZ9f4euEgOmmA_"/>
<input name="SRN" type="hidden" value=""/>
<input name="SWEHo" type="hidden" value=""/>
<input name="SWETS" type="hidden" value="1543090428"/>
</form><script language="javascript">var formObj = document.forms["RedirectForHost"];formObj.SWEHo.value=top.location.hostname;formObj.submit();</script></body></html>

what is supposed to give me is something like this:

<HTML lang="en"><head><title>List of Registered Medical Practitioners | Doctor Search</title><script>(function(h,o,t,j,a,r){
     h.hj=h.hj||function(){(h.hj.q=h.hj.q||[]).push(arguments)};
     h._hjSettings={hjid:782750,hjsv:6};
     a=o.getElementsByTagName('head')[0];
     r=o.createElement('script');r.async=1;
     r.src=t+h._hjSettings.hjid+j+h._hjSettings.hjsv;
     a.appendChild(r);
    })(window,document,'https://static.hotjar.com/c/hotjar-','.js?sv=');</script><script>history.forward();
    var sUserAgent = navigator.userAgent;
    if (sUserAgent.indexOf("MSIE") > 0)
    {
     document.write('<link href="files/gmclrmpie.css" rel="stylesheet" type="text/css">');
     document.execCommand("BackgroundImageCache",false,true);

    }
    else
    {
     document.write('<link href="files/gmclrmp.css" rel="stylesheet" type="text/css">');
    }
   </script>

<script language="javascript">
  if (window.addEventListener)
    {   
      window.addEventListener('load', attachFormSubmit, false); 
    } 
  else if (window.attachEvent)
      { 
        window.attachEvent('onload', attachFormSubmit);
  }
  
  var oldSWESubmitForm;
  
  function attachFormSubmit()
  {
            oldSWESubmitForm = SWESubmitForm;
            SWESubmitForm = AlwaysFireBeforeSWESubmitForm;   
  }
  
  function AlwaysFireBeforeSWESubmitForm(varA,varB,varC,varD) 
  {         
    var LoadingBox = document.getElementById("Loading");
    LoadingBox.style.display = "block";
    SWESubmitForm = oldSWESubmitForm;
    SWESubmitForm(varA,varB,varC,varD); 
  }</script>
<script language="javascript" src="23048/scripts/swecommon.js"></script> 
<script language="javascript" src="23048/scripts/swemessages_enu.js"></script> 
<script language="javascript" src="23048/scripts/swecmn_li.js"></script> 
<script language="javascript" src="srf1538672264_444/bscripts/all/controls_applet_gmc_web_health_provider_search_applet.js"></script> 
</head><body bgcolor="#ffffff" leftmargin="0" topmargin="0" marginwidth="0" marginheight="0"><div id="Loading" style="width:300px;height:30px;position:absolute;left:330px;top:275px;display:none;text-align:left;vertical-align:text-center;padding:5px;padding-top:15px;margin:0px;background-color:white;font-color:black;filter:alpha(opacity=90);-moz-opacity: .9;opacity: .9;border-style:solid;border-color:lightgrey;"><img id=animatedLoading src="images/loading.gif" border=0 valign=center align=right padding=0>&nbsp;&nbsp;&nbsp;Loading...</div><table border="0" cellpadding="0" cellspacing="0" width="960"><tr><td class="skiptocontentlink"><a href="#PageContent" id="skiptoContent" tabindex="1" title="Skip to the page content" name="skiptoContent">Skip to content</a></td></tr><tr><td><table border="0" cellpadding="6" cellspacing="0" width="960"><tr><td align="left" valign="top"><img src="images/structure/strapline_dark_blue.png"  align="top" title="Working with doctors Working for patients" alt="Working with doctors Working for patients" border="0"></td><td rowspan="2" align="right"><img src="images/GMClogo.gif" alt="General Medical  Council" title="General Medical Council" border="0"></td></tr><tr><td class="LRMPHeaderHome"><h1>List of Registered Medical Practitioners</h1></td></tr></table></td></tr></table><script language="javascript">top.document.title = "List of Registered Medical Practitioners | Doctor Search";</script><table width="960" cellpadding="0" cellspacing="0" border="0"><tr><td><a name="SWEApplet2" id="SWEApplet2" href="#SWEApplet2" title="" tabindex="0"><span style= "height:1px; width:1px; position:absolute; overflow:hidden; top=10px; visibility:hidden;">Ignore Link</span></a>
<table border="0" cellpadding="0" cellspacing="0" width="960" class="GMCNavTable"><tr><td><div class="GMCNavMenu"><div class="GMCNavMenuItem"><a href='/GMCLRMP_enu/help/help.html#search_page' title="Opens in a new window" target="_blank" tabindex="3" tabindex=1997  id='s_2_1_0_0'>Help</a></div></div></td></tr></table></td></tr><tr><div id="PageContent"><td><a name="SWEApplet3" id="SWEApplet3" href="#SWEApplet3" title="" tabindex="0"><span style= "height:1px; width:1px; position:absolute; overflow:hidden; top=10px; visibility:hidden;">Ignore Link</span></a>
<form name="SWEForm3_0" method="post" action="/gmclrmp_enu/start.swe" >
<table border="0" cellpadding="8" width="960"><tr><td class="home"><h2>Doctor Search</h2></td></tr><tr><td class="tablebody"><p>The List of Registered Medical Practitioners lets you check the registration status of any doctor who is registered with us or who has been registered at any time since 20 October 2005.<br><br></p><p>You can also find information about doctors' fitness to practise history from the same date. This includes sanctions that were applied before 20 October 2005 but which were still active at that time. For example, it does show if a doctor was suspended from the register for six months from 18 October 2005.<br><br></p>
  <p>It does not include sanctions from before 20 October 2005 that were not active at that time. For example, it would not show if a doctor was suspended from the register for six months between 1 January 2005 and 1 July 2005.<br><br></p>
  <p>Please&nbsp<a href="http://www.gmc-uk.org/about/contacts.asp" tabindex=3 target="_blank" title="opens in a new window">contact us</a>&nbspif you need information from before this time period.<br><br></p>
  <p>You can find more information on what is or is not available online in our&nbsp<a href="/GMCLRMP_enu/help/help.html#explanation" tabindex=3 target="_blank" title="opens in a new window">helptext</a>.<br><br></p>
  <p>If you need help searching for a particular doctor, read our&nbsp<a href="/GMCLRMP_enu/help/help.html#searching" tabindex=3 target="_blank" title="opens in a new window">helptext about doctors' names, reference numbers and other search advice.</a><br><br>
 </td>
</tr>
 <tr><td class="error"></td></tr>
</table>

<table class="tablebody" border="0" cellpadding="0"  cellspacing="4" align="centre">
<ul>
<tr>
 <td><li class="formlist"><label for="gmcrefnumber"><a href='/GMCLRMP_enu/help/help.html#doctor_id' title="Opens in a new window" target="_blank" tabindex="3">GMC Reference Number</a></label></li></td><td><nobr><input type="text" name='s_3_1_5_0' value='' id="gmcrefnumber" tabindex="3" style="width:140" id='s_3_1_5_0' tabindex=2997  maxlength="7"></nobr></td></tr><tr><td><li class="formlist"><label for="givenname"><a href='/GMCLRMP_enu/help/help.html#given_name' title="Opens in a new window" target="_blank" tabindex="3">Given Name</a></label></li></td><td><nobr><input type="text" name='s_3_1_3_0' value='' id="givenname" tabindex="3" style="width:140" id='s_3_1_3_0' tabindex=2997  maxlength="255"></nobr></td></tr><tr><td><li class="formlist"><label for="surname"><a href='/GMCLRMP_enu/help/help.html#surname' title="Opens in a new window" target="_blank" tabindex="3">Surname</a></label></li></td><td><nobr><input type="text" name='s_3_1_9_0' value='' id="surname" tabindex="3" style="width:140" id='s_3_1_9_0' tabindex=2997  maxlength="255"></nobr></td></tr><tr><td><li class="formlist"><label for="soundslikeflag"><a href='/GMCLRMP_enu/help/help.html#sounds_like' title="Opens in a new window" target="_blank" tabindex="-1">Sounds Like</a></label></li></td><td><nobr><input type="checkbox" name='60' value='Y' title = 'Sounds Like' id="soundslikeflag" tabindex="3" id='s_60_cb' tabindex=2997 ><input type="hidden" name='s_3_1_6_0' value='60' ></nobr></td></tr><tr><td><li class="formlist"><label for="gpflag"><a href='/GMCLRMP_enu/help/help.html#GPsOnly' title="Opens in a new window" target="_blank" tabindex="-1">Only doctors on the GP Register</a></label></li></td><td><nobr><input type="checkbox" name='40' value='Y' title = 'GP Flag' id="gpflag" tabindex="3" id='s_40_cb' tabindex=2997 ><input type="hidden" name='s_3_1_4_0' value='40' ></nobr></td></tr><tr><td><li class="formlist"><label for="gender"><a href='/GMCLRMP_enu/help/help.html#gender' title="Opens in a new window" target="_blank" tabindex="3">Gender</a></label></li></td><td><nobr><select name="s_3_1_7_0" id="gender" tabindex="3" style="width:80" id='s_3_1_7_0' tabindex=2997 >
<option value="" >-Select-
<option value="Man" >Man
<option value="Woman" >Woman
</select>
</nobr></td></tr></ul></table><table><tr><td><img src="images/spacer.gif" height="5" width="475" alt=""></td><td><span class="minibuttonOn"><a href='JavaScript:SWESubmitForm(document.SWEForm3_0,s_0,"s_3_1_10_0","VRId-1")' onclick='Edit__0__Control__ExecuteQuery__onclick(null, "s_3_1_10_0")'  tabindex="3" tabindex=2997  id='s_3_1_10_0'>Search</a></span></td></tr></table><table cellpadding="8" width="100%"><tr><td class="tablebody"><a href='JavaScript:SWESubmitForm(document.SWEForm3_0,s_1,"s_3_1_8_0","VRId-1")'  tabindex="3" tabindex=2997  id='s_3_1_8_0'>Search for more than one doctor using their GMC reference numbers</a></td></tr></table><input type="hidden" name='SWEFo' value='SWEForm3_0' >
<input type="hidden" name='SWEField' value='' >
<input type="hidden" name='SWENeedContext' value='true' >
<input type="hidden" name='SWENoHttpRedir' value='false' >
<input type="hidden" name='W' value='t' >
<input type="hidden" name='SWECmd' value='InvokeMethod' >
<input type="hidden" name='SWEMethod' value='Refresh' >
<input type="hidden" name='SWERowIds' value='' >
<input type="hidden" name='SWESP' value='false' >
<input type="hidden" name='SWEVI' value='' >
<input type="hidden" name='SWESPNR' value='' >
<input type="hidden" name='SWEPOC' value='' >
<input type="hidden" name='SWESPNH' value='' >
<input type="hidden" name='SWEH' value='' >
<input type="hidden" name='SWETargetView' value='' >
<input type="hidden" name='SWEDIC' value='false' >
<input type="hidden" name='_sn' value='Ap2CHEtE-yIqJQDq-nR9LuJy6SLpJkaa.5TMwNhmLfqKUAvZLcG5C1zupRv8yaRpWVQJRc3gTHvHNpaP9ui446RbQmhiuAXh3A8ZUoWYfRgoe2BOi5gquh2R69JD7eF5SdWGZ36RAP4n4itQANKWReoSAczi-uL3782qzwpiU4V5DxfZ1SpfNIZaf2FqHsifn3u.9wjUzkA_' >
<input type="hidden" name='SWEReqRowId' value='0' >
<input type="hidden" name='SWEView' value='GMC WEB Doctor Search' >
<input type="hidden" name='SWEC' value='2' >
<input type="hidden" name='SWERowId' value='VRId-1' >
<input type="hidden" name='SWETVI' value='' >
<input type="hidden" name='SWEW' value='' >
<input type="hidden" name='SWEBID' value='-1' >
<input type="hidden" name='SWEM' value='' >
<input type="hidden" name='SRN' value='' >
<input type="hidden" name='SWESPa' value='' >
<input type="hidden" name='SWETS' value='' >
<input type="hidden" name='SWEContainer' value='' >
<input type="hidden" name='SWEWN' value='' >
<input type="hidden" name='SWEKeepContext' value='0' >
<input type="hidden" name='SWEApplet' value='GMC WEB Health Provider Search Applet' >
<input type="hidden" name='SWETA' value='' >
</form>
</td></div></tr><tr><td><a name="SWEApplet1" id="SWEApplet1" href="#SWEApplet1" title="" tabindex="0"><span style= "height:1px; width:1px; position:absolute; overflow:hidden; top=10px; visibility:hidden;">Ignore Link</span></a>
<form name="SWEForm1_0" method="post" action="/gmclrmp_enu/start.swe" >
<table border="0" cellpadding="0" cellspacing="0" width="960"><tr><td><div class="siteTools"><ul><li class="browsealoud"><a href='http://www.gmc-uk.org/accessibility/browsealoud.asp' title="Opens in a new window" target="_blank" tabindex="3" tabindex=3997  id='s_1_1_0_0'>Browsealoud</a></li></ul></div></td></tr></table><input type="hidden" name='SWEField' value='' >
<input type="hidden" name='SWEFo' value='SWEForm1_0' >
<input type="hidden" name='SWENeedContext' value='true' >
<input type="hidden" name='SWESP' value='false' >
<input type="hidden" name='SWERowIds' value='' >
<input type="hidden" name='SWEMethod' value='Refresh' >
<input type="hidden" name='SWECmd' value='InvokeMethod' >
<input type="hidden" name='W' value='t' >
<input type="hidden" name='SWENoHttpRedir' value='false' >
<input type="hidden" name='SWEVI' value='' >
<input type="hidden" name='SWEPOC' value='' >
<input type="hidden" name='SWESPNR' value='' >
<input type="hidden" name='SWETargetView' value='' >
<input type="hidden" name='SWESPNH' value='' >
<input type="hidden" name='SWEH' value='' >
<input type="hidden" name='SWEDIC' value='false' >
<input type="hidden" name='SWEReqRowId' value='0' >
<input type="hidden" name='_sn' value='Ap2CHEtE-yIqJQDq-nR9LuJy6SLpJkaa.5TMwNhmLfqKUAvZLcG5C1zupRv8yaRpWVQJRc3gTHvHNpaP9ui446RbQmhiuAXh3A8ZUoWYfRgoe2BOi5gquh2R69JD7eF5SdWGZ36RAP4n4itQANKWReoSAczi-uL3782qzwpiU4V5DxfZ1SpfNIZaf2FqHsifn3u.9wjUzkA_' >
<input type="hidden" name='SWEView' value='GMC WEB Doctor Search' >
<input type="hidden" name='SWETVI' value='' >
<input type="hidden" name='SWERowId' value='1-CPQ4W3' >
<input type="hidden" name='SWEC' value='2' >
<input type="hidden" name='SWEM' value='' >
<input type="hidden" name='SWEBID' value='-1' >
<input type="hidden" name='SWEW' value='' >
<input type="hidden" name='SWESPa' value='' >
<input type="hidden" name='SRN' value='' >
<input type="hidden" name='SWEContainer' value='' >
<input type="hidden" name='SWETS' value='' >
<input type="hidden" name='SWETA' value='' >
<input type="hidden" name='SWEApplet' value='GMC LRMP Browse Aloud Form Applet' >
<input type="hidden" name='SWEWN' value='' >
</form>
<script>

top._swescript.SWESyncCheck(2, "You cannot continue to work on this page because the state of this page cannot be restored on the server.");
var s_0 = {action:"/gmclrmp_enu/start.swe#SWEApplet3",SWECmd:"InvokeMethod",SWEMethod:"NewQuerySearch",SWEView:"GMC WEB Doctor Search",SWEApplet:"GMC WEB Health Provider Search Applet",SWEReqRowId:"1",SWESP:"false",SWENeedContext:"true",SWEDIC:"false"};
var s_1 = {action:"/gmclrmp_enu/start.swe",target:"_sweview",SWECmd:"GotoView",SWEMethod:"GotoView",SWEView:"GMC WEB Doctor Multiple Search",SWEApplet:"GMC WEB Health Provider Search Applet",SWEReqRowId:"0",SWESP:"false",SWENeedContext:"true",SWEKeepContext:"0",SWEDIC:"false"};
function SWEDoRefresh(anchor) { location.replace('/gmclrmp_enu/start.swe?SWECmd=Refresh&SWEVI=&_sn=Ap2CHEtE-yIqJQDq-nR9LuJy6SLpJkaa.5TMwNhmLfqKUAvZLcG5C1zupRv8yaRpWVQJRc3gTHvHNpaP9ui446RbQmhiuAXh3A8ZUoWYfRgoe2BOi5gquh2R69JD7eF5SdWGZ36RAP4n4itQANKWReoSAczi-uL3782qzwpiU4V5DxfZ1SpfNIZaf2FqHsifn3u.9wjUzkA_&SWEView=GMC+WEB+Doctor+Search&SWEC=2&SRN=' + anchor); }
if (opener == null && top == window) { var SWEPopupWin = null; var SWEJannaPopupWin = null; }
if (typeof(Top().SWEHtmlPopupName) == 'undefined' || Top().SWEHtmlPopupName == null) Top().SWEHtmlPopupName = '_swe1543090536';
if (typeof(top._samePage) != 'undefined' && top._samePage!="") top._samePage = "";
Top().SWECount = 2;

if(typeof(Top().SWEServerCount)=='undefined' || Top().SWEServerCount<Top().SWECount) {Top().SWEServerCount = Top().SWECount;}
g_bInitialized = true;

if (top&&top._sweclient&&top._sweclient._sweviewbar) { top._sweclient._sweviewbar.location.replace('/gmclrmp_enu/start.swe?SWECmd=GetCachedFrame&_sn=Ap2CHEtE-yIqJQDq-nR9LuJy6SLpJkaa.5TMwNhmLfqKUAvZLcG5C1zupRv8yaRpWVQJRc3gTHvHNpaP9ui446RbQmhiuAXh3A8ZUoWYfRgoe2BOi5gquh2R69JD7eF5SdWGZ36RAP4n4itQANKWReoSAczi-uL3782qzwpiU4V5DxfZ1SpfNIZaf2FqHsifn3u.9wjUzkA_&SWEC=2&SWEFrame=top._sweclient._sweviewbar&SRN='); }
var ctrlLookupMap = new Array();
function FindControl (appletName, controlName) { return (ctrlLookupMap [appletName + "." + controlName]); }
var ctrlCS = new Array();if (Top()._swescript!=null && typeof(top._swescript)!='undefined') Top()._swescript.ctrlCS = ctrlCS;
else Top().ctrlCS = ctrlCS;
ctrlLookupMap["GMC WEB Health Provider Search Applet.GMC Person UId"] = "s_3_1_5_0";
ctrlLookupMap["GMC WEB Health Provider Search Applet.First Name"] = "s_3_1_3_0";
ctrlLookupMap["GMC WEB Health Provider Search Applet.Last Name"] = "s_3_1_9_0";
ctrlLookupMap["GMC WEB Health Provider Search Applet.GMC Sounds Like Flag"] = "s_3_1_6_0";
ctrlLookupMap["GMC WEB Health Provider Search Applet.GMC GP Flag"] = "s_3_1_4_0";
ctrlLookupMap["GMC WEB Health Provider Search Applet.Gender"] = "s_3_1_7_0";
</script>
<script for=window event=onunload>
ClearTimer();
if (opener == null) { SWEClosePopup(); SWECloseJannaPopup(); }
</script>
<script for=window event=onload>
StartTimer(540000,1,0,10,0);
</script>
</td></tr></table><table border="0" cellpadding="0" cellspacing="0" width="960"><tr><td><hr></td></tr><tr><td><div class="footerLeft"><p>The GMC is a registered charity in England and Wales (1089278) and Scotland (SC037750).<br/>© Copyright General Medical Council 2010. All rights reserved</p></div><h2 class="hide">Disclaimer and Privacy Statement</h2><div class="footerRight"><ul><li class="first"><a href="http://www.gmc-uk.org/disclaimer.asp" title="Opens in a new window" tabindex="98" target="_blank">Disclaimer</a></li><li><a href="http://www.gmc-uk.org/privacy_policy.asp" title="Opens in a new window" tabindex="99" target="_blank">Privacy Statement</a></li></ul></div></td></tr></table>

So what I mean is that I can't actually get to the landing page. Can someone help me get there? Thank you!

Eiri
  • 685
  • 1
  • 9
  • 12
  • I see a notice a pop-up saying session expired login again when using selenium and after dismissing that there is then a search form available. – QHarr Nov 24 '18 at 20:58
  • But if I use: https://webcache.gmc-uk.org/gmclrmp_enu/start.swe?SWECmd=GotoView&SWEBHWND=&_sn=LpzxgBIg6nS7ZHtQdMRBW-aQ7PooANEb6N5xwbLvLmh4xN6wvuwRp9IzXq-FqT5jyO8ZWNf1WN1UGtkw9fGacbmQjMURXmTcPgPYKXfhnHlPXI1YD6EKpP3PLvCJU8lKsUPfcdAQKF-qesaCpXSvU8RF1Ymskg7uOdbKExsoNFBo-1jTMn2bnUyql6jLrBU0WHNoIS0hzzI_&SWEView=GMC+WEB+Doctor+Search&SRN=&SWEHo=webcache.gmc-uk.org&SWETS=1543093061&SWEApplet=GMC+WEB+Health+Provider+Search+Applet seems to be ok. N.B. This is just for launching the page via Selenium and Chrome. Sorry, if that info is not helpful. – QHarr Nov 24 '18 at 21:03
  • @QHarr the thing is that the URL you mentioned that is okay, uses '_sn' and I don't want to be explicit. But with RoboBrowser, even using that URL (the one that seems ok for you) it doesn't go to the correct landing page. – Eiri Nov 24 '18 at 21:09

2 Answers2

0

You may use another library called. Beautifulsoup

from bs4 import BeautifulSoup

For me it works. But you may check also whether the Url you want to pars it, redirects the request to another location or not! Then you may look for redirect GET mode within your preferable Library

  • Beatifulsoup is not going to give me another html content. I can use it to prettify or find different stuff in the html but this doesn't actually solve the problem. – Eiri Nov 24 '18 at 22:31
0

As I said you may check whether the url you want to capture using redirection or not, According to this Question you may need to use another library

import urllib2
  • nope. still returns what i wrote above instead of what i see on the actual HTML when I inspect it. – Eiri Nov 25 '18 at 11:26
  • You may send the url, i will try. Because nor working has any meaning in programming culture :-) – Ashkan Kamyab Nov 25 '18 at 11:42
  • yes, you are right. sorry for not being more specific. so this is the URL: 'https://webcache.gmc-uk.org/gmclrmp_enu/start.swe' – Eiri Nov 25 '18 at 12:01