0

Can someone help me regarding the following problem. I am trying to analyze a security log to find false alerts. The false alerts are those containing "TXT was not created" and true are with "txt was not created". How can I extract the particular "txt was not created" from the data source (sample input data given below).

from mrjob.job import MRJob

class MRWordFrequencyCount(MRJob):

def mapper(self, _, line):
    words = line.split()
    for word in words:
        word = unicode(word, "utf-8", errors="ignore") 
        yield word, 1

def reducer(self, key, values):
    yield key, sum(values)

if __name__ == '__main__':
    MRWordFrequencyCount.run()

A sample input is given here:

Mon Feb  1 12:13:59 EST 2016 virtual user etransactiondev started to upload file 
/export/home/pub/etransactiondev/uploads/etransactionenvironment/ABC/rrd/in/WCWT.SMR.XYZ0002.PLSE.INPUT01.LFEP_APOL_D_M_20160201171358.TXT
/export/home/pub/etransactiondev/uploads/etransactionenvironment/ABC/rrd/in/WCWT.SMR.XYZ0002.PLSE.INPUT01.LFEP_APOL_D_M_20160201171358.txt was not created
John Vandenberg
  • 474
  • 6
  • 16
Shiv
  • 1
  • 1
  • > "TXT was not created" and true are with "txt was not created". Is there an error or is the difference really just the case of the words 'TXT' and 'txt'? – DAXaholic Apr 28 '16 at 06:18

1 Answers1

0

Can you just check the first word?

word = word.split(' ')
if word[0] == 'TXT':
    do something...
user3145912
  • 131
  • 1
  • 15
  • Thanks Vomit for the answer. As of now i am trying to extract username from the input file. Could you help me with extraction of username like in the input line: Mon Feb 1 12:13:59 EST 2016 virtual user etransactiondev started to upload file. I need to extract etransactiondev – Shiv May 08 '16 at 04:08