0

The HTML returned keeps telling me to restart the browser, and I'm a little lost:

require 'rubygems'
require 'mechanize'

def getHtml(the_url)
  agent = Mechanize.new
  agent.keep_alive = false
  agent.user_agent = "gibsonSim"
  agent.user_agent_alias = "Mechanize"
  agent.redirect_ok = true
  agent.add_auth('www.http://corpus2.byu.edu/','omitted', 'omitted')
  resp = agent.get(the_url)
  puts resp.body
  return resp   
end

url = "http://corpus2.byu.edu/glowbe/x2.asp?     chooser=seq&p=%5Bsolid%5D&w2=&wl=4&wr=4&r1=&r2=&ipos1=-select-&B7=SEARCH&showsec=y&sec1=0&sec2=0&sortBy=freq&sortByDo2=freq&minfreq1=freq&freq1=20&freq2=20&numhits=100&kh=100&groupBy=words&whatshow=raw&saveList=no&changed=&corpus=glowbe&word=&sbs=&sbs1=&sbsreg1=&sbsr=&sbsgroup=&redidID=&ownsearch=y&compared=&holder=&whatdo=seq&rand1=y&whatdo1=1&didRandom=n&minFreq=freq&s1=0&s2=0&s3=0&perc=mi"
puts getHtml(url)

I'm really not sure why this is occurring every time in Mechanize but only sometimes in Chrome.

The returned HTML is:

<style>

<!--



option { font-family: Verdana; font-size: 9px }
input { font-family: Verdana; font-size: 9px }
body { font-family: Verdana; font-size: 11px }
div { font-family: Verdana; font-size: 11px }
p { font-family: Verdana; font-size: 11px }
td { font-family: Verdana; font-size: 11px }



-->
</style>

<html>

<head>
<meta http-equiv="Content-Type" content="text/html; charset=windows-1252">
<title>New Page 1</title>
<script language=Javascript>

function x(x1)
{
top.lefto.document.zabba.reset();
top.lefto.document.zabba.p.value = x1;
top.lefto.document.zabba.wl.options[0].selected = true;
top.lefto.document.zabba.whatsee[0].checked='true';
top.lefto.document.zabba.submit();
}

function x()
{
top.lefto.document.zabba.submit();
}

</script>

</head>


<body>


<div align="center">
<table align="center" border="0" cellpadding="10" cellspacing="0" style="border-    collapse: collapse" bordercolor="#111111" width="70%" id="AutoNumber1">
<tr><td style="background-color: #FFFFFF">&nbsp;</td></tr>
  <tr>
    <td align="center" width="100%">


Please close your browser <b>completely</b>, and then open your browser and start a new session.


</td>
  </tr>
</table>
<p>&nbsp;</p>
</body>
</html>

Thanks for any help you can give!

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
c0d3junk13
  • 1,162
  • 9
  • 6
  • Shouldn't agent.add_auth('www.http://corpus2.byu.edu/','omitted', 'omitted') be agent.add_auth('http://corpus2.byu.edu/','omitted', 'omitted') – rainkinz Feb 11 '14 at 00:36
  • apologies that was a mistake , the same issue still arises when this is corrected though ! – c0d3junk13 Feb 11 '14 at 01:07
  • What happens if you use the browser signature of one of the common browsers? Their site might be sniffing what you're sending, then figuring your browser is corrupted and needs to be restarted. – the Tin Man Feb 11 '14 at 04:18
  • Are there really a bunch of spaces in that url? – pguardiario Feb 11 '14 at 08:31
  • Good idea I will give it a shot ! no I think I did it when I was indenting the code for the post ! – c0d3junk13 Feb 11 '14 at 10:07

1 Answers1

0

I'm not sure if this is the entire problem, but it shows that Mechanize/Nokogiri are not happy:

require 'nokogiri'

html = <<EOT
<style>

<!--
option { font-family: Verdana; font-size: 9px }
input { font-family: Verdana; font-size: 9px }
body { font-family: Verdana; font-size: 11px }
div { font-family: Verdana; font-size: 11px }
p { font-family: Verdana; font-size: 11px }
td { font-family: Verdana; font-size: 11px }
-->
</style>

<html>

<head>
<meta http-equiv="Content-Type" content="text/html; charset=windows-1252">
<title>New Page 1</title>
<script language=Javascript>

function x(x1)
{
top.lefto.document.zabba.reset();
top.lefto.document.zabba.p.value = x1;
top.lefto.document.zabba.wl.options[0].selected = true;
top.lefto.document.zabba.whatsee[0].checked='true';
top.lefto.document.zabba.submit();
}

function x()
{
top.lefto.document.zabba.submit();
}

</script>
</head>
<body>
<div align="center">
<table align="center" border="0" cellpadding="10" cellspacing="0" style="border-    collapse: collapse" bordercolor="#111111" width="70%" id="AutoNumber1">
<tr><td style="background-color: #FFFFFF">&nbsp;</td></tr>
  <tr>
    <td align="center" width="100%">


Please close your browser <b>completely</b>, and then open your browser and start a new session.
</td>
  </tr>
</table>
<p>&nbsp;</p>
</body>
</html>
EOT

doc = Nokogiri::HTML(html)
puts doc.errors

Running that shows there are errors with the HTML:

>> htmlParseStartTag: misplaced <html> tag
>> htmlParseStartTag: misplaced <head> tag

And, here's what Nokogiri thinks the document is after it's done its fix-ups:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
<head>
<style>

<!--
option { font-family: Verdana; font-size: 9px }
input { font-family: Verdana; font-size: 9px }
body { font-family: Verdana; font-size: 11px }
div { font-family: Verdana; font-size: 11px }
p { font-family: Verdana; font-size: 11px }
td { font-family: Verdana; font-size: 11px }
-->
</style>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>New Page 1</title>
<script language="Javascript">

function x(x1)
{
top.lefto.document.zabba.reset();
top.lefto.document.zabba.p.value = x1;
top.lefto.document.zabba.wl.options[0].selected = true;
top.lefto.document.zabba.whatsee[0].checked='true';
top.lefto.document.zabba.submit();
}

function x()
{
top.lefto.document.zabba.submit();
}

</script>
</head>
<body>
<div align="center">
<table align="center" border="0" cellpadding="10" cellspacing="0" style="border-    collapse: collapse" bordercolor="#111111" width="70%" id="AutoNumber1">
<tr><td style="background-color: #FFFFFF"> </td></tr>
<tr>
<td align="center" width="100%">


Please close your browser <b>completely</b>, and then open your browser and start a new session.
</td>
  </tr>
</table>
<p> </p>

</div>
</body>
</html>
the Tin Man
  • 158,662
  • 42
  • 215
  • 303