0

I am trying to parse a large amount of JSON data generated from a remote web service. The output produced is paginated across 500 URIs and each URI contains 100 JSON objects. I need to match a property in each JSON object, it's DOI (a digital object identifier), against a corresponding field fetched from a local database and then update the record.

The issue I am having is controlling my looping constructs to seek out the matching JSON DOI while making sure that all the data has been parsed.

As you can see I have tried to use a combination of break and continue statements but I am not able to 'move' beyond the first URI.

I later introduced a flag variable to help control the loops without effect.

 while($obj = $result->fetch_object()){       
       for($i=1;$i<=$outputs_json['meta']['response']['total-pages'];$i++){ 
         $url = 'xxxxxxxxxxxxxxx&page%5Bnumber%5D='."$i".'&page%5Bsize%5D=100';
         if($outputs = json_decode(file_get_contents($url),true)===false){
         }
         else{   
            try{
                $outputs = json_decode(file_get_contents($url),true); 
                $j=0;                   
                do{
                    $flag = false;
                    $doi = trim($outputs['data'][$j]['attributes']['identifiers']['dois'][0], '"');
                    if(!utf8_encode($obj->doi)===$doi) continue;                           
                    }else{
                        $flag = true;                   
                        $j++;
                    }
                }while($j!==101);
                  if($flag===true) break;
               } catch(Exception $e) {
            }       
        }   
    }
}

} What is the optimal approach that guarantees each JSON object at all URIs is parsed and that CRUD operations are only performed on my database when a fetched record's DOI field matches the DOI property of the incoming JSON data?

devnull
  • 9
  • 4

1 Answers1

0

I'm not 100% sure I understand every aspect of your question but for me it would make sense to change the order of execution

  • fetch page from external service
    • decode json and iterate through all 100 objects
      • get one DOI
      • fetch corresponding record from database
      • change db record
    • when all json-objects are progressed - fetch next url
  • repeat until all 100 urls are fetched

I think it's not a good idea to fetch one record from local DB and try to find it in 100 different remote calls - instead it's better to base your workflow/loops on fetched remote data and try to find the corresponding elements in your local DB

If you think that approach will fit your task - I can of course help you with the code :)

Evil_skunk
  • 3,040
  • 4
  • 30
  • 42
  • 1
    Interesting. I had adapted the code from a previous version that didn't have to bother matching on the DOI but a changing business requirement made me just edit the existing code without rethinking my approach. Thanks for the help with the code but I should be fine :) I won't be able to run and test this until I am at work on Monday so I'll let you know how it goes at that time. Until then I am curious to see if and what others might recommend. Cheers! – devnull Jul 07 '18 at 19:01