0

I'm setting up a simple data scraper that pulls the source code from an external URL (I have accounted for CORS) using .load().

The page that I'm loading however has a ton of scripts that also try to execute on my page when .load() runs. Is there any way to invoke .load() without all of the external scripts also loading and running? Or maybe I can somehow stop the scripts from running once .load() has been invoked?

-edit-

Here is my current code.

<input id = "url" type = "text" size = "100"></input>
<button id = "load">Load</button>

$('#load').on('click', function() {
    if ($('#url').val())
        $('#html').load($('#url').val());
    else
        alert('No URL entered.');
});
giwook
  • 570
  • 4
  • 11
  • 23
  • please share a code how you do it right now – m.antkowicz Sep 10 '15 at 20:18
  • Yes, simply remove all of the scripts from the html string before you parse it to html. Good luck! – Kevin B Sep 10 '15 at 20:20
  • Edited original question, see code at bottom. Not sure what you mean Kevin B. – giwook Sep 10 '15 at 20:20
  • 1
    Do you need the entire HTML, or is there an outermost selector you could use? e.g. `load($('#url').val() + " #content")` or something? Script blocks won't execute when you're loading only a fragment of the document. – Paul Roub Sep 10 '15 at 20:24
  • 2
    use `$.get` instead of `load()` and `html()` yourself which should strip out scripts – charlietfl Sep 10 '15 at 20:25
  • @giwook Given this html string: ``, convert it to `` using string manipulation. – Kevin B Sep 10 '15 at 20:27
  • @PaulRoub has an id that I could use, but there are also a ton of scripts inside the body tag. I believe that's the outermost selector I'd be able to use because I'm gathering data from multiple sources throughout the page and I pretty much need everything in . – giwook Sep 10 '15 at 20:28
  • 1
    That should be fine. Per [the docs](http://api.jquery.com/load/#script-execution), "If .load() is called with a selector expression appended to the URL... the scripts are stripped out prior to the DOM being updated, and thus are not executed." Be sure you use the body's `#id` as the selector; `body` won't work. – Paul Roub Sep 10 '15 at 20:30
  • @giwook - Did you read *charlietfl* suggestion ? This is the way to go. script tags inserted via `innerHTML` will not be executed. – DavidDomain Sep 10 '15 at 20:31
  • @PaulRoub I'm not sure if the id of body would even work, i would expect that to be stripped out just like and . if the id of the body would work, so would using the body selector. – Kevin B Sep 10 '15 at 20:31
  • @KevinB In theory, yes. In practice, [no](http://stackoverflow.com/questions/5271316/jquery-load-body-from-external-html). – Paul Roub Sep 10 '15 at 20:34
  • Can always manipulate the response first , wrap in `$()` and remove `script` elements yourself. Before they are inserted in dom they are just html strings. `html()` itself used to strip them out...surprised it still doesn't – charlietfl Sep 10 '15 at 20:51
  • Actually, running into a small problem here when using the .load() statement. After getting user input for the URL, I set the URL to a variable and then pass it to .load() as a parameter but it doesn't seem to be loading any content. When I call the variable in the console, the correct URL shows but it seems .load() doesn't want to take variables as arguments, because when I invoke the .load() using the actual URL as a parameter instead of a variable that contains the URL, .load() seems to work fine. – giwook Sep 11 '15 at 13:51
  • @giwook Ask that as a separate question. Be sure to use the same code, and include the *exact* contents of the variable. – Paul Roub Sep 11 '15 at 14:01
  • Asked in separate question: http://stackoverflow.com/questions/32525440/calling-load-with-selectors-not-working – giwook Sep 11 '15 at 14:08

1 Answers1

0

As Paul Roub said in the comments, using a selector with .load() will strip out the scripts in the HTML you're pulling as per the docs.

giwook
  • 570
  • 4
  • 11
  • 23