0

I am webscraping a HTML page using preg_match_all in PHP. This is what I'm trying to scrape:

  <script>
    function fsb38(x) {
      var b=new 
      Array(98,100,97,98,98,98,99,50,51,55,53,50,48,100,57,98,50,100,53,100,97,48,100,52,100,57,97,56,97,51,54,99,56,38,104,52,61,53,98,99,54,102,57,55,49,99,55,101,55,61,101,48,98,55,99,57,102,110,56,57,102,98,111,78,54,102,102,109,114,53,111,54,101,102,48,48,38,54,98,61,116,50,97,99,38,56,101,51,57,49,102,61,100,101,105,106,101,63,101,101,57,48,52,112,104,112,46,115,110,111,105,115,115,105,109);
      var p=new Array(0,0,0,0,1,1,1,0,0,1,0,0,1,1,0,0,1,1,1,0,1,0,1,0,1,0,1,0,1,1,0,0,0,0,0,1,0,0,1,1,0,0,1,1,1,0,0,0,1,1,1,0,0,0,1,0,0,1,0,0,0,0,1,1,0,0,0,1,1,0,1,0,0,1,0,0,1,1,0,1,1,0,1,1,1,1,0,1,0,0,0,1,1,0,1,1,0,1,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1);       
      window.location = c(b,p) + x;
      return false;
    }
  </script>

Usually preg_match_all('/var b=new(.*)var p=new/is', $output, $ar); would work perfectly fine. However, since there are multiple occurrences of this throughout the page it only shows me 1 match: the start of where I told it to scrape from, to the very last occurrence of var p=new.

I have tried using this to do it: preg_match_all('/var b=new(.*)(\n)(\s)var p=new/is', $output, $ar); – but when I use that I get nothing returned. What am I doing wrong?

Toto
  • 89,455
  • 62
  • 89
  • 125
Grant
  • 229
  • 5
  • 13

2 Answers2

2

use this if you want to get all Array()

preg_match_all('/var.*?=new(.*?)\)\;/is', $output, $ar);

use this if you want to get only the b=new Array()

preg_match_all('/var b=new(.*?)\)\;/is', $output, $ar);
Ramz
  • 326
  • 2
  • 6
1

Regular expressions are "greedy" - part .* matches longest possible string. You need "ungreedy" behavior - use U modifier.

http://php.net/manual/en/reference.pcre.pattern.modifiers.php

Kacer
  • 679
  • 3
  • 12