0

I'm trying to scrape a bunch of local html files. Each one has a piece of javascript embeded inside the file, with a different window.open path, like so:

<script>

function goTo() {

if (document.getElementById('somedomain').checked) {
window.open("http://www.somedomain.com");
}

if (document.getElementById('visit').checked) {
window.open("http://extract-this-url.com/?somevar=12345&anothervar=59305&etc=etc");
}

}
</script>

I'm trying extract that second URL - it'll be a different URL for each file (As will the first 'somedomain' url).

I've been looking at SimpleHTMLDOM but it doesnt look like it can do javascript thats embedded within a HTML file.

Is there any decent way of doing this?

Sk446
  • 1,240
  • 3
  • 19
  • 38

1 Answers1

1

Just use a regexp:

preg_match('#visit.*?window\.open\("(.*?)"#is',$text,$matches);
print_r($matches);
Dracony
  • 842
  • 6
  • 15
  • Cant seem to get that to work - just getting an empty array. I assume in the example, $text would just be the HTML source fo the file to extract from, correct? – Sk446 Jan 04 '13 at 11:23
  • My mistake, it should be #is no #s . Edited it now =) – Dracony Jan 04 '13 at 11:31