-2

hello everyone I'm searching a special sentence or word in a html page after make a webrequest

the sentence = Couldn't resolve host 'http:'

I try to code a script using pycurl with b.getvalue() but seems doesen't works

website to try http://www.moorelandpartners.com/plugins/system/plugin_googlemap2_proxy.php

code :

http://pastebin.com/qAUjv1ux

I would like search the total sentence or just maybe the word "http" or "Couldn't"

Thanks for your help

toto
  • 1
  • what do you mean - it doesn't work ? Do you get an error ? Tot test if a string contains a word - you can simply use `in`.. i.e. if `"Couldn't" in string` – Tony Suffolk 66 Jan 28 '15 at 20:14
  • Hello, No I get the else then there is write on the webpage only Couldn't resolve host 'http:' vivst the website you will see .. – toto Jan 28 '15 at 20:16
  • How to check the word in the HTML page ? – toto Jan 28 '15 at 20:27

1 Answers1

0

This seems to work (uses the 'in' operator as I suggested in my comment) :

import pycurl
import StringIO
import sys
import time

ip = "http://www.moorelandpartners.com/plugins/system/plugin_googlemap2_proxy.php"
c = pycurl.Curl()
b = StringIO.StringIO()
c.setopt(pycurl.WRITEFUNCTION, b.write)
c.setopt(pycurl.TIMEOUT, 10) # Note 1
c.setopt(pycurl.CONNECTTIMEOUT, 10) # Note 1
c.setopt(c.URL, ip)

try:
    c.perform()
except Exception:
    gg = 88
    print "No ",ip
else:
    html = b.getvalue()
    if "Couldn't resolve host" in html: # Note 2
         print "{0} FOUND ".format( ip ) # Note 3
    else:
         print "do not works"

What I did :

  • Note 1 : Increased the timeouts - for some reason the setting of "1" didn't work for me
  • Note 2 : used the 'in' operator to test that the returned page contained the words we are looking for.
  • Note 3 : removed references to bcolors.OKGREEN and bcolors.ENDC as your bcolors was not defined.

When I tested this on my pc it "worked" - i.e. it stated that it found the web page, and it found the relevant text.

Tony Suffolk 66
  • 9,358
  • 3
  • 30
  • 33