I am building a small plagiarism detecting system in php for practice. Well I did some research on Google figured that I may use Google API (custom search API) to build a plagiarism detecting software.
Well I found this question very helpful [How would you code an anti plagiarism site?]
I have managed to obtain the result of search from google api using following codes
<?php
ini_set('max_execution_time',300);
require_once '../../src/Google_Client.php';
require_once '../../src/contrib/Google_CustomsearchService.php';
session_start();
$client = new Google_Client();
$client->setApplicationName('Google CustomSearch PHP Starter Application');
$client->setDeveloperKey('MY_DEVELOPER_KEY');
$search = new Google_CustomsearchService($client);
$to_search="This is the text that should be searched in google so that the result that I obtain can be used by my codes to perform plagarism analysis";
$result = $search->cse->listCse($to_search, array('cx' => 'MY_SEARCH_ENGINE_ID'));
for($i=0; $i<6; $i++)
{
print "<pre>" . print_r($result, true) . "</pre>";
}
?>
From the $result variable I have the [link], [snippet] and [html snipped] obtained from google search. using the code below
$result['items'][$i]['snippet'];
$result['items'][$i]['link'];
Here $i is the integer value obtained from loop.
The problem is As you know that, I can only send short keyword or few lines for searching in google but not a huge text so should I substr the big chunks of text into small lines and then run multiple queries? or should I do something else? The snippet, and link value I will obtain can be analysed for plagiarism. Doing this resulted huge amount of query which overflowed the limit of hundred query per day.
Please suggest me the proper way of doing what I am supposed to do. The way I am doing query to Google and then analyzing the huge text with the user input for plagarism, Is this correct way?