0

I have a problem with obtaining data from specific website - when trying to download raw website data with R 3.6.3 using following example code:

website_raw <- readLines("https://tge.pl/gaz-rdn?dateShow=09-02-2022")

The result I got is:

Error in file(con, "r") : cannot open the connection In addition: Warning message: In file(con, "r") : InternetOpenUrl failed: 'the connection with the server was reset'

readLines() method used to work fine on this website but from one week on it fails. I've tried also download.file() method: at the beginning the result was the same (error, connection reset) but after setting options(download.file.method = "libcurl"), website file starts to download but then it suddenly stops with information:

trying URL 'https://tge.pl/gaz-rdn?dateShow=09-02-2022'
Error in download.file("https://tge.pl/gaz-rdn?dateShow=09-02-2022", "test.html") : 
  cannot open URL 'https://tge.pl/gaz-rdn?dateShow=09-02-2022'
In addition: Warning message:
In download.file("https://tge.pl/gaz-rdn?dateShow=09-02-2022", "test.html") :
  URL 'https://tge.pl/gaz-rdn?dateShow=09-02-2022': status was 'Failure when receiving data from the peer'

I've tried also disabling Use Internet Explorer library/proxy for HTTP in Rstudio Global Options but it didn't help. Another solution that I've tested was read_html() from rvest package - getting following error:

Error in open.connection(x, "rb") : Send failure: Connection was reset

Downloading data from other websites works fine though, with all considered methods.

Is there any way I can download data from this website with R?

Any kind of help or suggestion will be highly appreciated

kakaba
  • 1
  • 1
    You can try `RSelenium` package. – Nad Pat Feb 22 '22 at 03:28
  • Looks like a handshake error based on ssl. Switch to httr `httr::GET('https://tge.pl/gaz-rdn?dateShow=09-02-2022')` – QHarr Feb 22 '22 at 05:22
  • Thanks for suggestions: 1) `RSelenium` would be hard to use because of server installation necessity - since "external limitations"exists, probably I can't do it on my cpu. 2) I've tried `httr::GET()` method on two machines: on the first it works perfectly fine but on the other (more important one) it doesn't - error as below: `Error 'curl::curl_fetch_memory(url, handle=handle)': Timeout was reached: Connection timed out after 10016 milliseconds` Setting up timeout manually in GET() method didn't help neither. Any ideas? – kakaba Feb 22 '22 at 18:34
  • Do both machines have the same httr version? – QHarr Feb 22 '22 at 19:18
  • Thank you for asking - yes, both have `httr_1.4.2` and `curl_4.3.1`. However, there is a difference between compilers: working machine has `compiler_3.6.3 R6_2.5.0` when the malfuncioned one: `compiler_3.6.3 R6_2.4.1` - I don't think it's a root cause of this issue. I will be thankful for other comments or suggestions. – kakaba Feb 23 '22 at 07:54

0 Answers0