0

I have a script that counts the frequency of words and saves the results to a json file. When it is run again, it reads the existing json file, combines the results, then re-writes the json file. This can happen repeatedly within a request, and there can be many simultaneous requests, so I used flock() to try to prevent errors. I let it run for a while yesterday and got good data but this morning I checked and the file was corrupt. (still a good text file, but the json was broken.)

Here are the relevant parts of my code:

if(is_file('/home/myuser/public_html/word_counts.json'))
  {
    $prevoius_counts=json_decode(file_get_contents('/home/myuser/public_html/word_counts.json'),true);
  }
if(!$prevoius_counts)  
  {
    $prevoius_counts=array();
  } 
$new_counts=count_words($item->Description,$item->IDENTIFIER); //Creates an array like: array('the'=>20,'it'=>15,'spectacular'=>1);
$combined_counts=sum_associatve(array($new_counts,$prevoius_counts)); like array_merge, but sums duplicate keys instead of overwriting.

$fh=fopen('/home/myuser/public_html/word_counts.json','c'); //this will always be over-written with new data, but the "c" prevents it from being truncated to 0 bytes
if (flock($fh, LOCK_EX)) 
  {
    fwrite($fh, json_encode($combined_counts));
    flock($fh, LOCK_UN);    // release the lock
  }
fclose($fh);

function count_words($description,$unique=null){
// /([\s_;?!\/\(\)\[\]{}<>\r\n"]|\.$|(?<=\D)[:,.\-]|[:,.\-](?=\D))/
// /([\s\-_,:;?!\/\(\)\[\]{}<>\r\n"]|(?<!\d)\.(?!\d))/
// http://rick.measham.id.au/paste/explain.pl?regex=
// http://stackoverflow.com/questions/20006448
    $to_be_counted=strtolower($description);
    $to_be_counted.=' BLOCKS '.$unique;
    $words=preg_split('/([\s_;?!\/\(\)\[\]{}<>\r\n"]|\.$|(?<=\D)[:,.]|[:,.](?=\D))/', $to_be_counted, null, PREG_SPLIT_NO_EMPTY);
    return array_count_values ($words);
  }

  function sum_associatve($arrays){
    $sum = array();
    foreach ($arrays as $array) {
        foreach ($array as $key => $value) {
            if (isset($sum[$key])) {
                $sum[$key] += $value;
            } else {
                $sum[$key] = $value;
            }
        }
    } 
    return $sum;
}

Since it works for a while but finally writes bad json, I don't know if it's a file locking problem, or if I have some issue where the json_encode is returning bad data...?

TecBrat
  • 3,643
  • 3
  • 28
  • 45
  • I am testing it now with `ftruncate();` added before my `fwrite();` I might have my own answer. – TecBrat Nov 20 '13 at 15:11

1 Answers1

0

This seems to be working, but I'm still not 100% sure it's getting everything:

if (flock($fh, LOCK_EX)) 
  {
    ftruncate($fh, 0);
    fwrite($fh, json_encode($combined_counts));
    flock($fh, LOCK_UN);    // release the lock
  }
fclose($fh);
TecBrat
  • 3,643
  • 3
  • 28
  • 45