2

I am scraping a few pages with selenium, and I do not use other frameworks (like scrapy, etc..) because of a lot of ajax action. My problem is that the content is refreshing automatically nearly every second (like for example financial data) but I want to scrape all the elements in a static state. I searched alot in the internet and especially here on stackoverflow. WHat is the easiest way to freeze the website with selenium? I even tried switching off the wireless adapter but this was a problem... This is the only command in the selenium docs that I found:

driver.set_network_conditions(offline=True, latency=5, throughput=500 * 1024)

I tested this code and when i run the script it doesn't have any effect. The website is still "auto refreshing"...

robrados
  • 144
  • 10
  • Can you share the url you're trying to parse? – Pedro Lobito Feb 06 '19 at 20:04
  • for example this one: https://gate hub.net/markets/XRP/USD+rhub8VRN55s94qWKDv6jmDy1pUykJzF3wq (there is no API for this site) – robrados Feb 06 '19 at 20:10
  • What do you plan to extract from that page? – Pedro Lobito Feb 06 '19 at 21:20
  • 1
    Is this that you need? https://api.gatehub.net/rippledata/v2/exchanges/USD+rhub8VRN55s94qWKDv6jmDy1pUykJzF3wq/XRP?descending=true&end=2019-02-06T21:20:00.000Z&limit=31&reduce=false&result=tesSUCCESS&start=2009-02-06T21:20:00.000Z You can increase the `limit` parameter if needed (tested max 400). – Pedro Lobito Feb 06 '19 at 21:23

2 Answers2

1

"for example this one: https://gatehub.net/markets/XRP/USD+rhub8VRN55s94qWKDv6jmDy1pUykJzF3wq (there is no API for this site)"


In fact, an api exists, but it isn't fully public.

To get the values of the chart as a json object, you'll need to construct a customized URL, something like:

https://api.gatehub.net/rippledata/v2/exchanges/USD+rhub8VRN55s94qWKDv6jmDy1pUykJzF3wq/XRP?descending=true&end=2019-02-06T21:20:00.000Z&limit=400&reduce=false&result=tesSUCCESS&start=2009-02-06T21:20:00.000Z

Output:

{"result":"success","count":400,"marker":"USD|rhub8VRN55s94qWKDv6jmDy1pUykJzF3wq|XRP||20190206014150|000044926668|00006|00003","exchanges":[{"base_amount":"0.12180204","counter_amount":"0.42056","node_index":6,"rate":"3.4528157","tx_index":18,"autobridged_currency":"ETH","autobridged_issuer":"rcA8X3TVMST1n3CJeAdGk1RdRCHii7N2h","buyer":"rGmGFAEx1hYEJuSAfrjEBdA48AXWJBMp1D","executed_time":"2019-02-06T21:14:00Z","ledger_index":44945715,"offer_sequence":39832,"provider":"rGmGFAEx1hYEJuSAfrjEBdA48AXWJBMp1D","seller":"rUmnnszuTRfhKnULCjcKzV7mJeazCF7Gik","taker":"rUmnnszuTRfhKnULCjcKzV7mJeazCF7Gik","tx_hash":"4E39DB1CB68B4635E773082042B47168094852ED4A11C93AED7F85A67F1F7EDD","tx_type":"OfferCreate","base_currency":"USD","base_issuer":"rhub8VRN55s94qWKDv6jmDy1pUykJzF3wq","counter_currency":"XRP"},{"base_amount":"322.8872040048709","counter_amount":"1109.37944","node_index":2,"rate":"3.4358111","tx_index":18,"autobridged_currency":"ETH","autobridged_issuer":"rcA8X3TVMST1n3CJeAdGk1RdRCHii7N2h","buyer":"rETx8GBiH6fxhTcfHM9fGeyShqxozyD3xe","executed_time":"2019-02-06T21:14:00Z","ledger_index":44945715,"offer_sequence":26918939,"provider":"rETx8GBiH6fxhTcfHM9fGeyShqxozyD3xe","seller":"rUmnnszuTRfhKnULCjcKzV7mJeazCF7Gik","taker":"rUmnnszuTRfhKnULCjcKzV7mJeazCF7Gik","tx_hash":"4E39DB1CB68B4635E773082042B47168094852ED4A11C93AED7F85A67F1F7EDD","tx_type":"OfferCreate","base_currency":"USD","base_issuer":"rhub8VRN55s94qWKDv6jmDy1pUykJzF3wq","counter_currency":"XRP"}

...

Notes:

  • You can change the limit parameter to display different number of records if needed (tested max 400)
  • Dates should also be automagically updated to get the latest values.
Pedro Lobito
  • 94,083
  • 31
  • 258
  • 268
  • thank you for your answer, this was helpful, although my question was how to stop the javascript / the autorefreshing the whole website/elements during a selenium driver session. – robrados Feb 06 '19 at 22:01
  • Try injecting a js error via selenium execute_script: `throw new Error();` – Pedro Lobito Feb 07 '19 at 11:50
0

One solution might be to look into being able to set config preferences for whichever browser you are using for your driver. For example, if using Firefox you could set accessibility.blockautorefresh to False, and then just use driver.refresh() when you are ready.

https://lifehacker.com/disable-automatic-web-page-refreshing-5321420

PHPUnit + Selenium: How to set Firefox about:config options?

Susie Queue
  • 36
  • 1
  • 3