1

I have to analyze a huge log file for management report purpose.

The format of the log file is as below:-

[2014-08-28 08:49:40 GMT][Level:DEBUG] Connection from UGUBUKBBBHJGJ.mt.site (123.131.21.20) , user : 12345678 for compositeId : com.my.solution.name.abc

[2014-08-28 08:49:41 GMT][Level:DEBUG] Connection from TYIYIYPOYUUGG.mt.site (123.131.21.20) , user : 12345678 for compositeId : com.my.solution.name.def

[2014-08-29 05:55:21 GMT][Level:DEBUG] Connection from OJPPMMJOOHJIH.mt.site (123.131.22.33) , user : 12345678 for compositeId : com.my.solution.name.ghi

[2014-08-29 05:55:22 GMT][Level:DEBUG] Connection from HGJJKHKHKHKJH.mt.site (123.131.22.33) , user : 12345678 for compositeId : com.my.solution.name.jkl

I have replaced the actual values in logs with some dummy ones.

How can I split my log file in such a way that my one inputsplit contains logs of only single date and thus one mapper processes all logs of a single day.

  • Split the file by yourself as framework wont do it for you as its a single file. – SMA Dec 11 '14 at 11:58
  • I am not sure how to implement Inputformat and recordreader to split the file in desired way. –  Dec 11 '14 at 12:09
  • AFAIK you cant, you will need to write say a shell script and manually split. – SMA Dec 11 '14 at 12:12

0 Answers0