I have created a module that should remove repeating characters using specific replacements depending on how many times the character repeats. Example, If "a" repeats 4 times, replace "a" with "¤" both values are equal to 1 byte. The problem I'm having is when the file size gets to be above 30KB or so, When I'm finished running the module some how it has increased in byte size. I have tried a few word count programs and apparently it is adding more characters I just haven't been able to fix my code. I'v tried a few ways and would like some assistance or ideas as to how it is adding bytes.
from itertools import groupby
with open("LICENSE.txt","r", encoding='utf-8') as rf, open('TESTINGOnline.txt','w', encoding='utf-8') as wf:
s = rf.read()
ret = ''
for k, v in groupby(s):
x = 'a'
chunk = list(v)
cnt = len(chunk)
if k == x and cnt <= 1:
el = 'ª'.rstrip('\n')
elif k == x and cnt == 2:
el = '¨'.rstrip('\n')
elif k == x and cnt == 3:
el = ''.rstrip('\n')
elif k == x and cnt == 4:
el = '¤'.rstrip('\n')
elif k == x and cnt == 5:
el = '¥'.rstrip('\n')
else:
el = ''.join(chunk).rstrip('\n')
ret += el
wf.write(ret.rstrip('\n'))