0

I have a log file (log.txt) in the form:

=========================================
March 01 2050 13:05:00 log v.2.6 
General Option: [default] log_options.xml
========================================= 
Loaded options from xml file: '/the/path/of/log_options.xml'
printPDF started
PDF export
PDF file created:'/path/of/file.1.pdf'
postProcessingDocument started
INDD file removed:'/path/of/file.1.indd'
Error opening document: '/path/of/some/filesomething.indd':Error: file doesnt exist or no permissions 
=========================================
March 01 2050 14:15:00 log v.2.6 
General Option: [default] log_options.xml
========================================= 
Loaded options from xml file: '/the/path/of/log_options.xml'
extendedprintPDF started
extendedprintPDF: Error: Unsaved documents have no full name: line xyz

Note: Each file name is of the format: 3lettersdatesomename_LO.pdf/indd. Example: MNM011112ThisFile_LO.pdf. Also, on a given day and time, the entry could either have just errors, just the message about the file created or both, like I have shown here.

The file continues this way. And, I have a db in the form:

id  itemName status
1   file     NULL

And so on...

Now, I am expected to go through the log file and for each file that is created or if there in an error, I am supposed to update the last column of DB with appropriate message: File created or Error. I thought of searching the string "PDF file created/Error" and then grabbing the file name.

I have tried various things like pathinfo() and strpos. But, I can't seem to understand how I am going to get it done.

Can someone please provide me some inputs on how I can solve this? The txt file and db are pretty huge.

NOTE: I provided the 2nd entry of the log file to be clear that the format in which errors appear IS NOT consistent. I would like to know if I can still achieve what I am supposed to with an inconsistent format for errors. Can somebody please help after reading the whole question again? There have been plenty of changes from the first time I posted this.

  • 1
    can you post the format of the line in the log instead of "more info"? If you need text processing you must find a pattern first. – Udan Dec 04 '12 at 15:58
  • Can you post atleast one full line for an error? Then we can see for a regular expression that matches. Would be nice if you could post about 10 different error lines in the file – Hugo Delsing Dec 04 '12 at 15:58
  • Made the file format more clear. The file just continues this way. For some dates/times, there are multiple PDF's created and Errors. With each PDF being created, the .indd file is removed. – Watchful Protector Dec 04 '12 at 16:44

3 Answers3

3

You can use the explode method of php to break your file into pieces of words. In case the fields in your text file are tab separated then you can explode on explode(String,'\t'); or else in case of space separated, explode on space.

Then a simple substr(word,start_index,length) on each word can give you the name of file (here start_index should be 0).

Using mysql_connect will help you connect to mysql database, or a much efficient way would be to use PDO (PHP Data Objects) to make your code much more reliable and flexible.

Another way out would be to use the preg_match method and specify a regular expression matching your error msg and parse for the file name.

You can refer to php.net manual for help any time.

dandan78
  • 13,328
  • 13
  • 64
  • 78
0

Are all of the files PDFs? If so you can do a regex search on files with the .pdf extension. However, if the filename is also contained in the error string, you will need to exclude that somehow.

// Assume filenames contain only upper/lowercase letters, 0-9, underscores, periods, dashes, and forward slashes
preg_match_all('/([a-zA-Z0-9_\.-/]+\.pdf)/', $log_file_contents, $matches);
// $matches should be an array containing each filename.
// You can do array_unique() to exclude duplicates.

Edit: Keep in mind, $matches will be a multi-dimensional array as described http://php.net/manual/en/function.preg-match-all.php and http://php.net/manual/en/function.preg-match.php

To test a regex expression, you can use http://regexpal.com/

imkingdavid
  • 1,411
  • 13
  • 26
0

Okay, so the main issue here is that you either don't have a consistent delimiter for "entries"..or else you are not providing enough info. So based on what you have provided, here is my suggestion. The main caveat here is that without a solid delimiter for "entries," there's no way to know for sure if the error matches up with the file name. The only way to fix this is to format your file better. Also you have to fill in some blanks, like your db info and how you actually perform the query.

$handle = fopen("log.txt", "rb");
while (!feof($handle)) {
  // get the current row 
  $row = fread($handle, 8192);

  // get file names
  preg_match('~^PDF file created:(.*?)$~',$row,$match);
  if ( isset($match[1]) ) {
    $files[] = $match[1];
  }

  // get errors
  preg_match('~^Error:(.*?)$~',$row,$match);
  if ( isset($match[1]) ) {
    $errors[] = $match[1];
  }
}
fclose($handle);

// connect to db

foreach ($files as $k => $file) {
  // assumes your table just has basename of file
  $file = basename($file);

  $error = ( isset($errors[$k]) ) ? $errors[$k] : null;

  $sql = "update tablename set status='$error' where itemName='$file'";

  // execute query
}

EDIT: Actually going back to your post, it looks like you want to update a table not insert, so you will want to change the query to be an update. And you may need to further work with $file in that foreach for your where clause, depending on how you store your filenames in your db (for example, if you just store the basename, you will likely want to do $file = basename($file); in the foreach). Code updated to reflect this.

So hopefully this will point you in the right direction.

CrayonViolent
  • 32,111
  • 5
  • 56
  • 79
  • Kindly see the edits I have made. It is more clear now, what the log.txt contains. – Watchful Protector Dec 04 '12 at 16:37
  • Okay well what i have provided will still "work" with what you have, but with the same caveat I have mentioned. You provided one "entry" of our text file, showing info about the file, error, etc.. but is there something separating each "entry" in your log file, such as a line of *'s or something, or does it go right into the next entry? – CrayonViolent Dec 04 '12 at 16:50
  • Also, you didn't really make clear what you actually wanted to update your db with..were you wanting to update the "status" column with the error produced, or some generic message or what? When people say "be clear about what you want," that means show an example of what you actually want to see pulled from entries in your log file, what you want to actually see show up in your db, etc.. – CrayonViolent Dec 04 '12 at 16:51
  • Notice the === in the file format. Thats how each entry starts. ALso, read the updated Note. The last column of db could either be updated to "file created" or "error encountered" as the case maybe. – Watchful Protector Dec 04 '12 at 17:09
  • I have updated the correct format of the pdf and indd files. Would your suggestions still work with the changes that I now have, keeping in mind my last 2 comments? – Watchful Protector Dec 05 '12 at 14:13