1

It's been years since I've used PHP and I am more than a little rusty. I am trying to write a quick script that will open a large file and split it into an array and then look for similar occurrences in each value. For example, the file consist of something like this:

Chapter 1. The Beginning 
 Art. 1.1 The story of the apple
 Art. 1.2 The story of the banana
 Art. 1.3 The story of the pear
Chapter 2. The middle
 Art. 1.1 The apple gets eaten
 Art. 1.2 The banana gets split
 Art. 1.3 Looks like the end for the pear!
Chapter 3. The End
…

I would like the script to automatically tell me that two of the values have the string "apple" in it and return "Art. 1.1 The Story of the apple" and "Art. 1.1 The apple gets eaten", and then also does the same for the banana and pear.

I am not looking to search through the array for a specific string I just need it to count occurrences and return what and where.

I have already got the script to open a file and then split it into an array. Just cant figure out how to find similar occurrences.

<?php
$file = fopen("./index.txt", "r");
$blah = array();
while (!feof($file)) {
   $blah[] = fgets($file);
}
fclose($file);

var_dump($blah);
?>

Any help would be appreciated.

  • _"Just cant figure out how to find similar occurrences"_ -- Well, that's the problem at hand... What have you tried so far? – elclanrs Aug 30 '13 at 03:27
  • How large is your file? holding it all in memory might be too much for PHP. Also, many other words are repeated, even in your short sample (story, end, gets, the, of). How will your proposed code know which ones to count? –  Aug 30 '13 at 03:32
  • It's not THAT large. The array has 1650 values. – Justin Klaus Aug 30 '13 at 03:34
  • Everything that I've tried has needed an actual value to search for. I know there is probably a simple solution to this, it is just escaping me at the moment. I understand there are multiple occurrences in the short example. The actual file does not have a lot of common occurrences such as "the", "of", etc... – Justin Klaus Aug 30 '13 at 03:36
  • Also, just an FYI: everything from `$file = fopen...` to `fclose...` can be replaced with a single line in this case: `$blah = file("./index.txt")` – ChicagoRedSox Aug 30 '13 at 03:39

1 Answers1

1

This solution is not perfect as it counts every single word in the text, so maybe you will have to modify it to better serve your needs, but it gives accurate statistic about how many times each word is mentioned in the file and also exactly on which rows.

$blah = file('./index.txt') ;

$stats = array();
foreach ($blah as $key=>$row) {
    $words = array_map('trim', explode(' ', $row));
    foreach ($words as $word)
        if (empty($stats[$word]))  {
            $stats[$word]['rows'] = $key.", ";
            $stats[$word]['count'] = 1;
        } else {
            $stats[$word]['rows'] .= $key.", ";
            $stats[$word]['count']++;
        }
}
print_r($stats);

I hope this idea will help you to get going on and polish it further to better suit your needs!

Antoan Milkov
  • 2,152
  • 17
  • 30