How is it possible to join multiple lines of a log file into 1 dataframe row?
ADDED ONE LINE -- Example 4-line log file:
[WARN ][2016-12-16 13:43:10,138][ConfigManagerLoader] - [Low max memory=477102080. Java max memory=1000 MB is recommended for production use, as a minimum.]
[DEBUG][2016-05-26 10:10:22,185][DataSourceImpl] - [SELECT mr.lb_id,mr.lf_id,mr.mr_id FROM mr WHERE (( mr.cap_em >
0 AND mr.cap_em > 5
)) ORDER BY mr.lb_id, mr.lf_id, mr.mr_id]
[ERROR][2016-12-21 13:51:04,710][DWRWorkflowService] - [Update Wizard - : [DWR WFR request error:
workflow rule = BenCommonResources-getDataRecords
version = 2.0
filterValues = [{"fieldName": "wotable_hwohtable.status", "filterValue": "CLOSED"}, {"fieldName": "wotable_hwohtable.status_clearance", "filterValue": "Goods Delivered"}]
sortValues = [{"fieldName": "wotable_hwohtable.cost_actual", "sortOrder": -1}]
Result code = ruleFailed
Result message = Database error while processing request.
Result details = null
]]
[INFO ][2019-03-15 12:34:55,886][DefaultListableBeanFactory] - [Overriding bean definition for bean 'cpnreq': replacing [Generic bean: class [com.ar.moves.domain.bom.Cpnreq]; scope=prototype; abstract=false; lazyInit=false; autowireMode=0; dependencyCheck=0; autowireCandidate=true; primary=false; factoryBeanName=null; factoryMethodName=null; initMethodName=null; destroyMethodName=null; defined in URL [jar:file:/D:/Dev/404.jar!/com/ar/moves/moves-context.xml]] with [Generic bean: class [com.ar.bl.bom.domain.Cpnreq]; scope=prototype; abstract=false; lazyInit=false; autowireMode=0; dependencyCheck=0; autowireCandidate=true; primary=false; factoryBeanName=null; factoryMethodName=null; initMethodName=null; destroyMethodName=null; defined in URL [jar:file:/D:/Dev/Tools/Tomcatv8.5-appGit-master/404.jar!/com/ar/bl/bom/bl-bom-context.xml]]]
(See representative 8-line extract at https://pastebin.com/bsmWWCgw.)
The structure is clean:
[PRIOR][datetime][ClassName] - [Msg]
but the message is often multi-lined, there may be multiple brackets in the message itself (even trailing…), or ^M newlines, but not necessarily… That makes it difficult to parse. Dunno where to begin here…
So, in order to process such a file, and be able to read it with something like:
#!/usr/bin/env Rscript
df <- read.table('D:/logfile.log')
we really need to have that merge of lines happening first. How is that doable?
The goal is to load the whole log file for making graphics, analysis (grepping out stuff), and eventually writing it back into a file, so -- if possible -- newlines should be kept in order to respect the original formatting.
The expected dataframe would look like:
PRIOR Datetime ClassName Msg
----- ------------------- ------------------- ----------
WARN 2016-12-16 13:43:10 ConfigManagerLoader Low max...
DEBUG 2016-05-26 10:10:22 DataSourceImpl SELECT ...
And, ideally once again, this should be doable in R directly (?), so that we can "process" a live log file (opened in write mode by the server app), "à la tail -f
".