decode string "2A4CT2A2C..." into "AACCCCTAACC..." from a text file to another

Question

I have a doc.txt which is like "2A4CT2A2C..." and i want to get "AACCCCTAACC..." and then write it to another doc1.txt I have tried:

(origin and destination are the paths of the docs)

def decode_txt(origin, destination):
    h = open(destination, "w")
    f = open(origin, "r")
    for character in f:
        h.write()

and couldn't think how to continue

Create a count string variable. Read a character. If it's a number, append it to the count string. If it's not a number, take the count string and convert it to a number, defaulting to one if it was empty, and then write the letter that you just read as many times as your count says to your output and clear the count string variable. — CherryDT, Oct 15 '22 at 19:20

score 0 · Answer 1 · answered Oct 15 '22 at 19:25

The only way I can really think of doing it is to break it up, Iterate through and have, if key of list[whatever] is character type int, append the next key x-1 times at list[key + 1]. You could then just Iterate using a for I in range list and delete character type integers.

tdelaney · Accepted Answer · 2022-10-15T20:46:27.220

0

You have a pattern of zero or more digits followed by a single character. A regular expression can handle it. (\d*) will group zero or more digits followed by a ([^\d]) - a single non-digit character to repeat.

import re

def decode_txt(origin, destination):
    with open (origin) as infile:
        text = infile.read()
    with open(destination, "w") as outfile:
        for cnt, char in re.findall(r"(\d*)([^d])", text):
            outfile.write(char * (int(cnt) if cnt else 1))

test = "2A4CT2A2C"
open("origin", "w").write(test)
decode_txt("origin", "destination")
print(open("destination").read())
assert open("destination").read() == "AACCCCTAACC"

Suppose you just wanted string input and output. This could reduce to

import re

text = "2A4CT2A2C"
out = []
for cnt, char in re.findall(r"(\d*)([^d])", text):
    out.extend(char * (int(cnt) if cnt else 1))
out = "".join(out)

If you have a lot of text, the out list will be large. You could use io.StringIO() to create a file-like buffer instead.

edited Oct 15 '22 at 20:46

answered Oct 15 '22 at 19:39

tdelaney

73,364
6
83
116

Hello thanks for your reply, it worked, but now im trying to do the same but with a string, like decode(string), i've used the same code from "for" till end, but it only prints out the ones with numbers – vanquish Oct 15 '22 at 20:25
The `for` loop consumes the string in `text` - the other stuff to read from a file could just be removed and it should work. As for writing to a string instead of a file, there are a couple of options. I'll post the easiest. – tdelaney Oct 15 '22 at 20:42
it still doesn't work, thanks – vanquish Oct 15 '22 at 21:01
The examples work... perhaps I didn't understand quite what you were asking. You could post your new problem as a new question for additional comment. – tdelaney Oct 15 '22 at 21:04

Xin Cheng · Answer 3 · 2022-10-18T03:36:20.900

A solution without `re` package

This solution did not use regex to split the text. And it also has good performance. As its operations are all in memory, and it has O(n) time complexity and O(n) space complexity (n with respect to string length).

def deco(s : str) -> str:
  L, j = [], len(s)
  for i in range(len(s)):
    if s[i].isdigit():    # update the current number's first dig position.
      j = min(j,i)
    else:
      if j == len(s): L.append(s[i])   # char without number
      else:  
          L.append(s[i]*int(s[j:i]))   # char with number s[j:i] ahead
          j = len(s)
  return "".join(L);

Demo

print(deco("2A4CT2A2C"))
print(deco("12A4CT2A2C"))

Output
```
AACCCCTAACC
AAAAAAAAAAAACCCCTAACC
```

decode string "2A4CT2A2C..." into "AACCCCTAACC..." from a text file to another

3 Answers3

A solution without re package

A solution without `re` package