Best way to find patterns in a string without knowing what I'm looking for?

Question

I have 500x500 bitmaps containing no more than 16 colors that I need to convert to a text file where each color is represented by a character.

I then need to reduce the size of the text file by finding patterns in each line.

I have the characters right now in a 2D array.

For example:

AHAHAH = 3(AH)

HAHAHA = 3(HA)

AAAHHH = 3(A)3(H)

ABYZTT = ABYZ2(T)

AHAHAB = 2(AH)AB

I don't think I can use regular expressions because there are so many possible combinations.

I am not even sure where to begin.

Does this answer your question? [Python string pattern recognition/compression](https://stackoverflow.com/questions/1914236/python-string-pattern-recognition-compression) — takendarkk, Mar 16 '21 at 18:35
Thanks @takendarkk I'm reading it now to see if I can adapt some of it — 736f5f6163636f756e74, Mar 16 '21 at 18:50
Do you need to come up with a compression algorithm of you own? otherwise you could use the zlib module e.g. `compressed = zlip.compress(yourString.encode())` — Alain T., Mar 16 '21 at 18:57
@AlainT. The output needs to be another .txt in the precise format above with numbers, parentheses, and characters. It will be read by ancient manufacturing machines that I have no wiggle room on. The 'new' machines run on Windows '95 — 736f5f6163636f756e74, Mar 16 '21 at 19:08
I see. You should provide a precise specification (or reference) to the compression/RLE algorithm. — Alain T., Mar 16 '21 at 19:11

736f5f6163636f756e74 · Accepted Answer · 2021-03-18T22:27:56.540

Here is what I did to solve my problem. I haven't thoroughly checked edge cases, but it's working on my test inputs. Maybe it will be helpful for someone in the future. It's Run-Length Encoding, but for groups of characters, not individual characters. From what I read, normal RLE would encode AAAAHAHA as A4H1A1H1A1, whereas I needed to encode 4A2HA.

string='AHYAHYAHAHAHAHAHAHAHBBBBBBBTATAZAB*+I'
length=len(string)
half=round(length/2)
new_string=""
i=1
while i<=half and string:
  if i>length-i:
    pass
  sub_string1=string[:i]
  sub_string2=string[i:i+i]
  if sub_string1==sub_string2:
    match=True
    count=1
    while match is True:
        sub_string1=string[count*i:(count+1)*i]
        sub_string2=string[(count+1)*i:(count+2)*i]
        if sub_string1 == sub_string2:
          count+=1
        else:
          match=False
          new_string+="("+str(count+1)+")"+sub_string1
          string=string[count*i+i:]
          i=1
  else:  
    if i==len(string):
      new_string+=string[0]
      string=string[1:]
      i=1
    else:
      i+=1

print(new_string)
(2)AHY(7)AH(7)B(2)TAZAB*+I

Best way to find patterns in a string without knowing what I'm looking for?

1 Answers1