File Parsing - How to identify specific lines only if next line contains a certain word

Question

I have a largish file of around 50k lines. I need to enter data from a line into a db only if the next line has a certain word in it. For instance

00:00:01   Request from 1.1.1.1 for A-record for www.website1.com
00:00:01   Sending reply to 1.1.1.1 about A-record for www.website1.com:
00:00:01   -> Answer: A-record for www.website1.com = 999.999.999.999
00:00:02   Request from 1.1.1.1 for A-record for www.website2.com
00:00:02   Plug-in "Domain Blacklist1" matched A-record for www.website2.com
00:00:03   Request from 1.1.1.1 for A-record for www.website3.com
00:00:04   Sending reply to 1.1.1.1 about A-record for www.website3.com:
00:00:01   -> Answer: A-record for www.website3.com = 888.888.888.888

I only want to enter the data from the 4th line because the 5th line has "Domain Blacklist" in it.

Any thoughts on how I might go about that?

Paul

Is this file directly accessible to your ColdFusion server or will you be pulling it with `cfhttp` or something along those lines? — Shawn, Mar 12 '19 at 18:11

SOS · Answer 1 · 2019-03-12T16:54:52.347

2

Use File functions to loop through the file, one line at a time. Use a separate variable to keep track of the "previous" line (updating it at the end of each iteration). Within the loop, check the current line for the magic phrase. If it's found, append the previous line variable to a result array. When finished, use the array of lines however you wish.

<cfscript>
   matched = [];
   previousLine  = "";
   theFile  = fileOpen("c:/path/to/your/file.txt");

   try {
       while(!fileIsEOF(theFile))  {
           currentLine = fileReadLine(theFile);

           // if phrase is found in *this* line, store *previous* line 
           if (reFindNoCase("(Domain Blacklist)", currentLine)) {
               arrayAppend(matched, previousLine );
           }

           // update previous line
           previousLine  = currentLine;
       }
   }
   finally {
       theFile.close();
   }

   // do something with the results
   writeDump(var=matched, label="Matched Lines");
</cfscript>

edited Mar 12 '19 at 16:54

answered Mar 12 '19 at 16:49

SOS

6,430
2
11
29

You could probably use a plain find() too, but a regex is more flexible, in case things change. – SOS Mar 12 '19 at 17:04
1

A huge +1 for a good use of `finally`. :-) Though do keep in mind that some file operations can load the file into server memory, which probably wouldn't be good for large file. I think `fileOpen()` only creates a pointer to the file (thus `fileClose()`)and `fileReadLine()` will only load one line at a time. But I think `fileRead()` will load the entire file and could blow out JVM memory pretty easily. – Shawn Mar 12 '19 at 19:11
@Shawn - Yup, `fileRead()` definitely drops the whole thing into memory. The latter uses pointers/current line only so I figured it would be more scalable if the file size increased (which never happens ;-). – SOS Mar 12 '19 at 19:25
File parsing is almost as much fun as date math. – Shawn Mar 12 '19 at 19:34
Throw in some web services and you've got yourself a party! (.. that no one in their right mind will attend). – SOS Mar 12 '19 at 20:07
Oohh....new weekend project: a web service to parse specific lines of a text file based on 345 days from the Wednesday of two weeks ago. You in? – Shawn Mar 12 '19 at 20:10
Um.. I'm busy for the next 345 months :-P Not to mention the migraine forming.. (Edit) Now that's just evil... – SOS Mar 12 '19 at 20:14
managed to sort it - probably not as elegantly as your answer Ageax but along the same lines. Once the magic word was hit then the previous lines var get inserted into the DB and those vars cleared and off we go again until the next hit - seems to work! Thanks for your help! – Paul Hopkinson Mar 13 '19 at 09:33
@PaulHopkinson - Yeah, sounds like the same end result. The reason I suggested using an array is it allows for more efficient query options. For example, you could dump the results into a text file and use bulk insert (sql server) which is a lot faster than individual inserts. – SOS Mar 13 '19 at 16:00

File Parsing - How to identify specific lines only if next line contains a certain word

1 Answers1