0

I've created a very basic script which gets the data from a url with PHP (For Google trends):

$url = 'https://www.google.com/trends/fetchComponent?hl=en-US&q=battlefield%201&cid=TIMESERIES_GRAPH_0&export=5&w=500&h=300&gprop=youtube&date=today%201-m';
$url2 = file_get_contents($url);

In the source code there's a certain type of data I want to extract

{"columns":[{"id":"d","label":"Date","type":"datetime"},{"role":"annotation","type":"string"},{"p":{"html":true},"role":"annotationText","type":"string"},{"id":"q0","label":"battlefield 1","type":"number"},{"role":"annotation","type":"string"},{"p":{"html":true},"role":"annotationText","type":"string"},{"role":"certainty","type":"boolean"}],"headlineDataPoints":[],"width":485,"axisAnnotations":[],"rows":[[{"v":new Date(2016, 8, 24, 12, 0),"f":"Saturday, September 24, 2016"},null,null,21,null,null,true],[{"v":new Date(2016, 8, 25, 12, 0),"f":"Sunday, September 25, 2016"},null,null,23,null,null,true],[{"v":new Date(2016, 8, 26, 12, 0),"f":"Monday, September 26, 2016"},null,null,19,null,null,true],[{"v":new Date(2016, 8, 27, 12, 0),"f":"Tuesday, September 27, 2016"},null,null,44,null,null,true],[{"v":new Date(2016, 8, 28, 12, 0),"f":"Wednesday, September 28, 2016"},null,null,54,null,null,true],[{"v":new Date(2016, 8, 29, 12, 0),"f":"Thursday, September 29, 2016"},null,null,39,null,null,true],[{"v":new Date(2016, 8, 30, 12, 0),"f":"Friday, September 30, 2016"},null,null,35,null,null,true],[{"v":new Date(2016, 9, 1, 12, 0),"f":"Saturday, October 1, 2016"},null,null,38,null,null,true],[{"v":new Date(2016, 9, 2, 12, 0),"f":"Sunday, October 2, 2016"},null,null,64,null,null,true],[{"v":new Date(2016, 9, 3, 12, 0),"f":"Monday, October 3, 2016"},null,null,46,null,null,true],[{"v":new Date(2016, 9, 4, 12, 0),"f":"Tuesday, October 4, 2016"},null,null,35,null,null,true],[{"v":new Date(2016, 9, 5, 12, 0),"f":"Wednesday, October 5, 2016"},null,null,34,null,null,true],[{"v":new Date(2016, 9, 6, 12, 0),"f":"Thursday, October 6, 2016"},null,null,34,null,null,true],[{"v":new Date(2016, 9, 7, 12, 0),"f":"Friday, October 7, 2016"},null,null,31,null,null,true],[{"v":new Date(2016, 9, 8, 12, 0),"f":"Saturday, October 8, 2016"},null,null,29,null,null,true],[{"v":new Date(2016, 9, 9, 12, 0),"f":"Sunday, October 9, 2016"},null,null,30,null,null,true],[{"v":new Date(2016, 9, 10, 12, 0),"f":"Monday, October 10, 2016"},null,null,31,null,null,true],[{"v":new Date(2016, 9, 11, 12, 0),"f":"Tuesday, October 11, 2016"},null,null,22,null,null,true],[{"v":new Date(2016, 9, 12, 12, 0),"f":"Wednesday, October 12, 2016"},null,null,40,null,null,true],[{"v":new Date(2016, 9, 13, 12, 0),"f":"Thursday, October 13, 2016"},null,null,63,null,null,true],[{"v":new Date(2016, 9, 14, 12, 0),"f":"Friday, October 14, 2016"},null,null,55,null,null,true],[{"v":new Date(2016, 9, 15, 12, 0),"f":"Saturday, October 15, 2016"},null,null,71,null,null,true],[{"v":new Date(2016, 9, 16, 12, 0),"f":"Sunday, October 16, 2016"},null,null,64,null,null,true],[{"v":new Date(2016, 9, 17, 12, 0),"f":"Monday, October 17, 2016"},null,null,84,null,null,true],[{"v":new Date(2016, 9, 18, 12, 0),"f":"Tuesday, October 18, 2016"},null,null,100,null,null,true],[{"v":new Date(2016, 9, 19, 12, 0),"f":"Wednesday, October 19, 2016"},null,null,null,null,null,true],[{"v":new Date(2016, 9, 20, 12, 0),"f":"Thursday, October 20, 2016"},null,null,null,null,null,true],[{"v":new Date(2016, 9, 21, 12, 0),"f":"Friday, October 21, 2016"},null,null,null,null,null,true],[{"v":new Date(2016, 9, 22, 12, 0),"f":"Saturday, October 22, 2016"},null,null,null,null,null,true],[{"v":new Date(2016, 9, 23, 12, 0),"f":"Sunday, October 23, 2016"},null,null,null,null,null,true]],"showHeadlines":false,"percentData":false,"colors":["#3f85f2"],"height":230}

How can I use preg_match to find this? I've tried

$data = preg_match('/^(var chartData = |; var htmlChart)\\/$/', $url2, $output_array);
print_r($output_array);

I used the first and last keywords to try and get what's inbetween them (Being the data i want)

I've tried using phpliveregex which actually does sort of pick up what I want (finding the var chartData string) http://www.phpliveregex.com/p/hCB yet when i try to clone that it doesnt work.

My question being, how can I extract this json object (inside the chartData variable) from Google trends using preg_match() because what i've tried hasn't been working.

ConorReidd
  • 276
  • 5
  • 25
  • Why aren't you simply using json_decode and accessing the property directly? – useyourillusiontoo Oct 23 '16 at 18:56
  • @useyourillusiontoo Impossible to tell from the extremely vague question title. But he's trying to extract the JSOX from somewhere within some JS/HTML, but his regex is somewhat cobbled together. (It's also not really JSON.) – mario Oct 23 '16 at 19:00

2 Answers2

2

The problem seems to come from several values in the "JSON string" that look all like new Date(2016, 9, 14, 12, 0) that contains spaces and are not enclosed between quotes. You can solve the problem adding quotes around them:

$str = preg_replace('~:(new Date\([^)]*\))~', ':"$1"', $str);
$jsonArray = json_decode($str, true);
Casimir et Hippolyte
  • 88,009
  • 5
  • 94
  • 125
  • My bad, I'm not very good at explaining things, My actual answer referred to getting the JSON data from the string of html, which @Marcus answered. Thankfully however you did answer this because that would have been my next problem! – ConorReidd Oct 23 '16 at 19:22
2

This should be helpfull

preg_match('/chartData\s+=\s+\{(.+)\}/', $url2,    $output_array);
$charData = $output_array[0];
marcus
  • 651
  • 6
  • 12