0

I am currently trying to match pattern for an eeprom dump text file to locate a certain address and then traverse 4 steps once I hit upon in the search. I have tried the following code for finding the pattern

regexp_list = ('A1 B2')
line = open("dump.txt", 'r').read()
pattern = re.compile(regexp_list)

matches = re.findall(pattern,line)

for match in matches:
    print(match)

this scans the dump for A1 B2 and displays if found. I need to add more such addresses in search criteria for ex: 'C1 B2', 'D1 F1'. I tried making the regexp_list as a list and not a tuple, but it didn't work.

This is one of the problem. Next when I hit upon the search, I want to traverse 4 places and then read the address from there on (See below).

Input:

0120   86 1B 00 A1  B2 FF 15 A0  05 C2 D1 E4  00 25 04 00 

Here when the search finds A1 B2 pattern, I want to move 4 places i.e to save data from C2 D1 E4 from the dump.

Expected Output:

C2 D1 E4

I hope the explanation was clear.

#

Thanks to @kcorlidy

Here's the final piece of code which I had to enter to delete the addresses in the first column.

newtxt = (text.split("A0 05")[1].split()[4:][:5])

for i in newtxt:
    if len(i) > 2:
        newtxt.remove(i)

and so the full code looks like

import re

text = open('dump.txt').read()

regex = r"(A1\s+B2)(\s+\w+){4}((\s+\w{2}(\s\w{4})?){3})"

for ele in re.findall(regex,text,re.MULTILINE):

    print(" ".join([ok for ok in ele[2].split() if len(ok) == 2]))

print(text.split("A1 B2")[1].split()[4:][:5])

#selects the next 5 elements in the array including the address in 1st col
newtxt = (text.split("A1 B2")[1].split()[4:][:5])

for i in newtxt:
    if len(i) > 2:
        newtxt.remove(i)

Input:

0120 86 1B 00 00 C1 FF 15 00 00 A1 B2 00 00 00 00 C2
0130 D1 E4 00 00 FF 04 01 54 00 EB 00 54 89 B8 00 00

Output:

C2 0130 D1 E4 00

C2 D1 E4 00
crazybyPy
  • 31
  • 6

1 Answers1

1

Using regex can extract text, but also you can complete it through split text.

Regex:

  1. (A1\s+B2) string start with A1 + one or more space + B2
  2. (\s+\w+){4} move 4 places
  3. ((\s+\w+(\s+\w{4})?){3}) extract 3 group of string, and There may be 4 unneeded characters in the group. Then combine them into one.

Split:

Note: If you have a very long text or multiple lines, don't use this way.

  1. text.split("A1 B2")[1] split text to two part. the after is we need
  2. .split() split by blank space and became the list ['FF', '15', 'A0', '05', 'C2', 'D1', 'E4', '00', '25', '04', '00']
  3. [4:][:3] move 4 places, and select the first three

Test code:

import re

text = """0120   86 1B 00 A1  B2 FF 15 A0  05 C2 D1 E4  00 25 04 00 
0120 86 1B 00 00 C1 FF 15 00 00 A1 B2 00 00 00 00 C2
0130 D1 E4 00 00 FF 04 01 54 00 EB 00 54 89 B8 00 00 """
regex = r"(A1\s+B2)(\s+\w+){4}((\s+\w{2}(\s\w{4})?){3})"

for ele in re.findall(regex,text,re.MULTILINE):
    #remove the string we do not need, such as blankspace, 0123, \n
    print(" ".join([ok for ok in ele[2].split() if len(ok) == 2]))

print( text.split("A1  B2")[1].split()[4:][:3] )

Output

C2 D1 E4
C2 D1 E4
['C2', 'D1', 'E4']
KC.
  • 2,981
  • 2
  • 12
  • 22
  • 1
    text = open('dump.txt').read() regex = r"(A1\s+B2)(\s+\w{2}){4}((\s+\w{2}){3})" print(re.search(regex, text).group(3).lstrip()) print(text.split("A1 B2")[1].split()[4:][:3]) – crazybyPy Dec 10 '18 at 08:50
  • I have adapted your code to read a file and search for the string. it works well. Top! However this is only valid if there are enough number of array elements to the right. If I want to traverse the array on the next row, can I do it ? see my ex below : – crazybyPy Dec 10 '18 at 08:53
  • 0120 86 1B 00 00 C1 FF 15 00 00 A1 B2 00 00 00 00 C2 – crazybyPy Dec 10 '18 at 08:55
  • 0130 D1 E4 00 00 FF 04 01 54 00 EB 00 54 89 B8 00 00 – crazybyPy Dec 10 '18 at 08:55
  • 1
    when I need to traverse from A1, B2 next four places and if there are no four elements in this array, i need to read from next two from the next row. Can we do this ? – crazybyPy Dec 10 '18 at 08:57
  • Of course, but i am not sure whether `C2 0130 D1` is what you want. @crazybyPy – KC. Dec 10 '18 at 11:03
  • No .. I would want to read C2, D1, E4. Which means that I need to take one element from the current row and next two elements from the next. "0120, 0130" are the address' which I need to ignore. – crazybyPy Dec 10 '18 at 11:16
  • @crazybyPy I modified my regex for this case. – KC. Dec 10 '18 at 11:46