0

I have a text file that I want to read and extract a certain string (which can appear several times). Then I want to print the result.

The string I'm trying to extract is the value of Rule MATCH Name.

Text file example:

201819:34:40Z ubuntu : Info: MODULE: FileScan MESSAGE: Scanning test 
201809:34:40Z ubuntu: Alert: MODULE: FileScan MESSAGE: FILE: /test/76.bin SCORE: 140 TYPE: EXE  AutoUpdates https://www.test.com/files:  **Rule MATCH Name**: this_is_test1 SUBSCORE:100
201819:34:40Z ubuntu : Info: MODULE: FileScan MESSAGE: Scanning test 
201809:34:40Z ubuntu: Alert: MODULE: FileScan MESSAGE: FILE: /test/7164.bin SCORE: 140 TYPE: EXE  AutoUpdates https://www.test.com/files:  **Rule MATCH Name**: this_is_test2 SUBSCORE:90 
201819:34:40Z ubuntu : Info: MODULE: FileScan MESSAGE: Scanning test 
201809:34:40Z ubuntu: Alert: MODULE: FileScan MESSAGE: FILE: /test/764.bin SCORE: 140 TYPE: EXE  AutoUpdates https://www.test.com/files:  **Rule MATCH Name**: this_is_test3 SUBSCORE:15
karel
  • 5,489
  • 46
  • 45
  • 50
bugnet17
  • 173
  • 1
  • 1
  • 12
  • 4
    StackOverflow expects you to try to solve your own problem first, as your attempts help us to better understand what you want. Please edit the question to show what you've tried, so as to illustrate a specific problem you're having in a [MCVE]. For more information, please see [Ask] and take the [Tour]. – quant Nov 11 '18 at 09:51

2 Answers2

0

You can use regex to solve this problem. Regexr is a great website to create and test regex rules.
Once you have a rule that fits your problem, load the file, use readlines() to get the text, and use python's re module to extract the values.

I made a quick solution(not sure if this is the value you are trying to extract):

import re
fl = r'201819:34:40Z ubuntu : Info: MODULE: FileScan MESSAGE: Scanning test 201809:34:40Z ubuntu: Alert: MODULE: FileScan MESSAGE: FILE: /test/76.bin SCORE: 140 TYPE: EXE AutoUpdates https://www.test.com/files: Rule MATCH Name: this_is_test1 SUBSCORE:100 201819:34:40Z ubuntu : Info: MODULE: FileScan MESSAGE: Scanning test 201809:34:40Z ubuntu: Alert: MODULE: FileScan MESSAGE: FILE: /test/7164.bin SCORE: 140 TYPE: EXE AutoUpdates https://www.test.com/files: Rule MATCH Name: this_is_test2 SUBSCORE:90 201819:34:40Z ubuntu : Info: MODULE: FileScan MESSAGE: Scanning test 201809:34:40Z ubuntu: Alert: MODULE: FileScan MESSAGE: FILE: /test/764.bin SCORE: 140 TYPE: EXE AutoUpdates https://www.test.com/files: Rule MATCH Name: this_is_test3 SUBSCORE:15'

re.findall(r'Rule MATCH Name:\s(\w+)\s', fl) 
# ['this_is_test1', 'this_is_test2', 'this_is_test3']

If reading from a file:

import re
with open('f.txt') as f:
    found = []
    for line in f.readlines():
        found += re.findall(r'Rule MATCH Name:\s(\w+)\s', line)
    print(found) # ['this_is_test1', 'this_is_test2', 'this_is_test3']
Dani G
  • 1,202
  • 9
  • 16
0

It is pretty easy with a method called "search", please follow the pseudo code:

import re
import sys
file = open(sys.argv[2], "r")

for line in file:
     if re.search(sys.argv[1], line):
         print line,
swapnil shashank
  • 877
  • 8
  • 11