0

I'm trying to use Ghost.py to do some web scraping. I'm trying to follow a link but the Ghost doesn't seem to actually evaluate the javascript and follow the link. My problem is that i'm in an HTTPS session and cannot use redirection. I've also looked at other options (like selenium) but I cannot install a browser on the machine that will run the script. I also have some javascript evaluation further so I cannot use mechanize.

Here's what I do...

## Open the website
page,resources = ghost.open('https://my.url.com/')

## Fill textboxes of the form (the form didn't have a name)
result, resources = ghost.set_field_value("input[name=UserName]", "myUser")
result, resources = ghost.set_field_value("input[name=Password]", "myPass")

## Submitting the form
result, resources = ghost.evaluate( "document.getElementsByClassName('loginform')[0].submit();", expect_loading=True)

## Print the link to make sure that's the one I want to follow
#result, resources = ghost.evaluate( "document.links[4].href")

## Click the link
result, resources = ghost.evaluate( "document.links[4].click()")

#print ghost.content

When I look at ghost.content, I'm still on the same page and result is empty. I noticed that when I add expect_loading=True when trying to evaluate the click, I get a timeout error.

When I try the to run the javascript in a Chrome Developper Tools console, I get

event.returnValue is deprecated. Please use the standard event.preventDefault() instead.

but the page does load up the linked url correctly.

Any ideas are welcome.

Charles

user294186
  • 63
  • 6

1 Answers1

0

I think you are using the wrong methods for that.
If you want to submit the form there's a special method for that:

page, resources = ghost.fire_on("loginform", "submit", expect_loading=True)

Also there's a special ghost.py method for performing a click:

ghost.click('#some-selector')

Another possibilty, if you just want to open that link could be:

link_url = ghost.evaluate("document.links[4]")[0]
ghost.open(link_url)

You only have to find the right selectors for that.
I don't know on which page you want to perform the task, thus I can't fix your code. But I hope this will help you.

Thorben
  • 953
  • 13
  • 28
  • Thanks for the reply, I should be able to try this tomorrow. – user294186 Apr 30 '14 at 15:04
  • I get a TimeoutError from the alternate you proposed on the form submit. Anyhow, I tried [code]ghost.click('a[href="/Internet/Home"]')[/code] which leads me to the same... the Ghost browser doesn't follow the link and stays on the same page. – user294186 May 06 '14 at 05:27
  • I get the same from [code]ghost.open(link_url['href'])[/code] Unfortunately, I'm shy on posting the page code since it leads to my ISP account to retreive some bandwitdh statistics... – user294186 May 06 '14 at 05:29
  • Hm that's strange. But you're right, posting your credentials wouldn't be the best idea ;). I just thought the url you are opening "my.url.com" is some public website. Then it would help if you can look inside its source code. Maybe the form fields aren't filled out right. Did you try to look what ghost is doing by enabling the graphical output? You can switch it on with: `Ghost(display=True)`. What does it display? – Thorben May 06 '14 at 09:38
  • I wasn't aware you could display the browser, good thing to know. – user294186 May 09 '14 at 00:52
  • I just modified a little bit your second suggestion for following a link and it made it... ghost.open(link_url['href']) – user294186 May 09 '14 at 00:54