Extract a certain string which can appear several times in a file

Question

I have a text file that I want to read and extract a certain string (which can appear several times). Then I want to print the result.

The string I'm trying to extract is the value of Rule MATCH Name.

Text file example:

201819:34:40Z ubuntu : Info: MODULE: FileScan MESSAGE: Scanning test 
201809:34:40Z ubuntu: Alert: MODULE: FileScan MESSAGE: FILE: /test/76.bin SCORE: 140 TYPE: EXE  AutoUpdates https://www.test.com/files:  **Rule MATCH Name**: this_is_test1 SUBSCORE:100
201819:34:40Z ubuntu : Info: MODULE: FileScan MESSAGE: Scanning test 
201809:34:40Z ubuntu: Alert: MODULE: FileScan MESSAGE: FILE: /test/7164.bin SCORE: 140 TYPE: EXE  AutoUpdates https://www.test.com/files:  **Rule MATCH Name**: this_is_test2 SUBSCORE:90 
201819:34:40Z ubuntu : Info: MODULE: FileScan MESSAGE: Scanning test 
201809:34:40Z ubuntu: Alert: MODULE: FileScan MESSAGE: FILE: /test/764.bin SCORE: 140 TYPE: EXE  AutoUpdates https://www.test.com/files:  **Rule MATCH Name**: this_is_test3 SUBSCORE:15

StackOverflow expects you to try to solve your own problem first, as your attempts help us to better understand what you want. Please edit the question to show what you've tried, so as to illustrate a specific problem you're having in a [MCVE]. For more information, please see [Ask] and take the [Tour]. — quant, Nov 11 '18 at 09:51

Dani G · Accepted Answer · 2018-11-11T12:09:09.667

You can use regex to solve this problem. Regexr is a great website to create and test regex rules.
Once you have a rule that fits your problem, load the file, use readlines() to get the text, and use python's re module to extract the values.

I made a quick solution(not sure if this is the value you are trying to extract):

import re
fl = r'201819:34:40Z ubuntu : Info: MODULE: FileScan MESSAGE: Scanning test 201809:34:40Z ubuntu: Alert: MODULE: FileScan MESSAGE: FILE: /test/76.bin SCORE: 140 TYPE: EXE AutoUpdates https://www.test.com/files: Rule MATCH Name: this_is_test1 SUBSCORE:100 201819:34:40Z ubuntu : Info: MODULE: FileScan MESSAGE: Scanning test 201809:34:40Z ubuntu: Alert: MODULE: FileScan MESSAGE: FILE: /test/7164.bin SCORE: 140 TYPE: EXE AutoUpdates https://www.test.com/files: Rule MATCH Name: this_is_test2 SUBSCORE:90 201819:34:40Z ubuntu : Info: MODULE: FileScan MESSAGE: Scanning test 201809:34:40Z ubuntu: Alert: MODULE: FileScan MESSAGE: FILE: /test/764.bin SCORE: 140 TYPE: EXE AutoUpdates https://www.test.com/files: Rule MATCH Name: this_is_test3 SUBSCORE:15'

re.findall(r'Rule MATCH Name:\s(\w+)\s', fl) 
# ['this_is_test1', 'this_is_test2', 'this_is_test3']

If reading from a file:

import re
with open('f.txt') as f:
    found = []
    for line in f.readlines():
        found += re.findall(r'Rule MATCH Name:\s(\w+)\s', line)
    print(found) # ['this_is_test1', 'this_is_test2', 'this_is_test3']

According to your example, how do I read from a file and then print the results? — bugnet17, Nov 11 '18 at 11:19

score 0 · Answer 2 · answered Nov 11 '18 at 11:07

0

It is pretty easy with a method called "search", please follow the pseudo code:

import re
import sys
file = open(sys.argv[2], "r")

for line in file:
     if re.search(sys.argv[1], line):
         print line,

answered Nov 11 '18 at 11:07

swapnil shashank

877
8
11

It prints all line. I need only the value of Rule MATCH Name.. – bugnet17 Nov 11 '18 at 11:30
Do you need the count? As printing the string multiple times won't be a good idea. – swapnil shashank Nov 11 '18 at 11:33
No.. I need the value of "rule match name". for example: Rule MATCH Name: this_is_test1 I'm trying to extract the "this_is_test1" – bugnet17 Nov 11 '18 at 11:37

Extract a certain string which can appear several times in a file

2 Answers2