I have two tables. In Table 1, I have around 400K rows where each row includes a paragraph of text that can be up to 50 sentences. In Table 2 I have a lexicon of 80k words with a score that I need for the coding of each word of each paragraph.
The whole point of my php script is to explode each paragraph of text into as many words as needed, then lookup into the lexicon for each word what the score is, and end up calculating the total score for all of the words for each row.
My strategy so far was to have a script that did the following:
- Connect to the database, Table 1
- While Loop, one row after another
- For the current row, explode the paragraph.
- For each word, look up into Table 2 if the word exists and return the score.
- Ending up with a total score for the current row.
- Updating Table 1 with the total score for the current paragraph.
- Going back to point number 2.
My code works but is not efficient. The problem is that the script is so slow that letting it run for an hour just calculates the first 500 rows. That's a problem because I have 400K rows. I will need this script for other projects.
What would you advise me to do to make this process less intense?
<?php
//Include functions
include "functions.php";
ini_set('max_execution_time', 9000000);
echo 'Time Limit = ' . ini_get('max_execution_time');
$db='senate';
//Function to search into the array lexicon
function searchForId($id, $array) {
foreach ($array as $key2 => $val) {
if ($val['word'] === $id) {
return $key2;
}
}
return null;
}
// tags to remove
$remove = array('{J}','{/J}','{N}','{/N}','{V}','{/V}','{RB}','{/RB}');
$x=1;
//Conecting the database
if (!$conn) {
die('Not connected : ' . mysql_error());}
// Choose the current db
mysql_select_db($db);
//Slurps the lexicon into an array
$sql = "SELECT word, score FROM concreteness";
$resultconcreteness = mysql_query($sql) or die(mysql_error());
$array = array();
while($row = mysql_fetch_assoc($resultconcreteness)) {
$array[] = $row;
}
//loop
while($x<=500000) {
$data = mysql_query("SELECT `key`, `tagged` FROM speechesLCMcoded WHERE `key`='$x'") or die(mysql_error());
// puts the "data" info into the $info array
$info = mysql_fetch_array( $data);
$tagged=$info['tagged'];
unset($weight);
unset($count);
$weight=0;
$count=0;
// Print out the contents of the entry
Print "<b>Key:</b> ".$info['key'] . " <br>";
// Explodes the sentence
$speech = explode(" ", $tagged);
// Loop every word
foreach($speech as $word) {
//Check if string contains our tag
if(!preg_match('/({V}|{J}|{N}|{RB})/', $word, $matches)) {} else{
//Removes our tags
$word = str_replace($remove, "", $word);
$id = searchForId($word, $array);
// print "ID: " . $id . "<br>";
// print "Word: " . $array[$id]['word'] . "<br>";
// print "Score: " . $array[$id]['score'] . "<br>";
$weight=$weight+$array[$id]['score'];
$count=$count +1;
// print "Weight: " . $weight . "<br>";
// print "Count: " . $count . "<br>";
}
}
$sql = "UPDATE speechesLCMcoded SET weight='$weight', count='$count' WHERE `key`='$x';" ;
$retval = mysql_query( $sql, $conn );
if(! $retval )
{die('Could not update data: ' . mysql_error());}
echo "Updated data successfully\n";
ob_flush();
flush();
//Increase the loop by one
$x=$x+1;
}?>
Here is the index:
CREATE TABLE `speechesLCMcoded` (
`key` int(11) NOT NULL AUTO_INCREMENT,
`speaker_state` varchar(100) NOT NULL,
`speaker_first` varchar(100) NOT NULL,
`congress` varchar(100) NOT NULL,
`title` varchar(100) NOT NULL,
`origin_url` varchar(100) NOT NULL,
`number` varchar(100) NOT NULL,
`order` varchar(100) NOT NULL,
`volume` varchar(100) NOT NULL,
`chamber` varchar(100) NOT NULL,
`session` varchar(100) NOT NULL,
`id` varchar(100) NOT NULL,
`raw` mediumtext NOT NULL,
`capitolwords_url` varchar(100) NOT NULL,
`speaker_party` varchar(100) NOT NULL,
`date` varchar(100) NOT NULL,
`bills` varchar(100) NOT NULL,
`bioguide_id` varchar(100) NOT NULL,
`pages` varchar(100) NOT NULL,
`speaker_last` varchar(100) NOT NULL,
`speaker_raw` varchar(100) NOT NULL,
`tagged` mediumtext NOT NULL,
`adjectives` varchar(10) NOT NULL,
`verbs` varchar(10) NOT NULL,
`nouns` varchar(10) NOT NULL,
`weight` varchar(50) NOT NULL,
`count` varchar(50) NOT NULL,
PRIMARY KEY (`key`)
) ENGINE=InnoDB AUTO_INCREMENT=408344 DEFAULT CHARSET=latin1