0

Using R, I am trying to get the links that are present on the following webpage: https://icerbox.com/folder/eVDOgpD1/Goldmine.320

The page contains 135 links to files. When you hover the mouse over the filename, to the right a blue download symbol shows. This download symbol leads to the actual url of the file. However, that URL seems to be generated by javascript and is not present in the html file itself.

I want to extract those 135 URL's, but I have no clue whatsoever about how to capture these dynamically generated URL's.

Can anyone help me how to get these? I am open to any approach in R (rvest, RSelenium, etc.)

Peter Verbeet
  • 1,786
  • 2
  • 13
  • 29
  • 2
    are you sure you have the rights to download these files? – MichaelChirico Jan 09 '18 at 19:38
  • 2
    You would be incorrect. The ToS : https://icerbox.com/ToS : _clearly_ states what you are trying to do is in violation of site policy and may subject those who help you to civil and criminal penalties. Since you deliberately lied to @MichaelChirico, I don't really care what happens to you. Stealing is one thing. Lying puts you in a completely lower class that is usually only inhabited by lawyers and politicians. – hrbrmstr Jan 09 '18 at 20:05
  • Thank you for sharing your point of view. – Peter Verbeet Jan 09 '18 at 20:29
  • 1
    I'm voting to close this question as off-topic because it's answer appears to entail aiding and abetting the committing a crime – MichaelChirico Jan 09 '18 at 20:43
  • 1
    I don't think there is anything wrong with figuring out how to extract a set of links. That is what the question is about. To me, the educational exercise of how to do this kind of extraction is what the question is about. The question does not entail downloading of any file. But close it if that's what you want. – Peter Verbeet Jan 09 '18 at 21:00
  • agreed Peter - [this has been discussed in meta](https://meta.stackoverflow.com/questions/274906/should-questions-that-violate-api-terms-of-service-be-flagged) and [comment from moderator](https://meta.stackoverflow.com/questions/341167/how-to-flag-people-asking-for-help-to-violate-another-sites-terms-of-service#comment430337_341175). But take the previous info in comments as something to be aware of. – user20650 Jan 10 '18 at 12:46

1 Answers1

2

Looks like you have a very similar need for PhantomJS as was used here with TidyText they were also looking to grab links that were in javascript

Chuck P
  • 3,862
  • 3
  • 9
  • 20
  • there's a lovely js link on that page that forgoes the need for system utility dependencies but helping this user at all could land you in legal trouble. – hrbrmstr Jan 09 '18 at 20:11
  • @chuck P, thank you for your advice, this is a really educational link. – Peter Verbeet Jan 09 '18 at 20:51