Python - removing repeated letters in a string

Question

Say I have a string in alphabetical order, based on the amount of times that a letter repeats.

Example: "BBBAADDC".

There are 3 B's, so they go at the start, 2 A's and 2 D's, so the A's go in front of the D's because they are in alphabetical order, and 1 C. Another example would be CCCCAAABBDDAB.

Note that there can be 4 letters in the middle somewhere (i.e. CCCC), as there could be 2 pairs of 2 letters.

However, let's say I can only have n letters in a row. For example, if n = 3 in the second example, then I would have to omit one "C" from the first substring of 4 C's, because there can only be a maximum of 3 of the same letters in a row.

Another example would be the string "CCCDDDAABC"; if n = 2, I would have to remove one C and one D to get the string CCDDAABC

Example input/output:

n=2: Input: AAABBCCCCDE, Output: AABBCCDE
n=4: Input: EEEEEFFFFGGG, Output: EEEEFFFFGGG
n=1: Input: XXYYZZ, Output: XYZ

How can I do this with Python? Thanks in advance!

This is what I have right now, although I'm not sure if it's on the right track. Here, z is the length of the string.

for k in range(z+1):
        if final_string[k] == final_string[k+1] == final_string[k+2] == final_string[k+3]: 
            final_string = final_string.translate({ord(final_string[k]): None})
return final_string

_although I'm not sure if it's on the right track_ Does it produce the results you want? — John Gordon, Mar 06 '21 at 21:05
Nope, definitely not. I either get "string index out of range" or every character in the string that's the same as "final_string[k]" removed. — , Mar 06 '21 at 21:06
I'm not sure what the actual question is. Do you just need to remove extra repeated letters? — John Gordon, Mar 06 '21 at 21:19
Yes, i.e. if there are more than a certain amount of repeated letters, then we omit these letters until there are only n of those letters. — , Mar 06 '21 at 21:20
What about _"so the A's go in front of the D's because they are in alphabetical order, and 1 C"_ ? Are you discarding letters if they're out of order? For clarity, add the expected result for each of the examples you've given. — aneroid, Mar 06 '21 at 21:42
Done! No, I've already sorted all of the letters in order (basically, there are 2 parameters I'm sorting by: alphabetical order, and length of substring (n). Say the maximum length of a substring is 5, and there are 7 A's, 8 B's, and 4 C's. The string that I would have is: "AAAAABBBBBCCCCBBBAA" because once we've taken out the 5 A's and 5 B's at the beginning, there are 2 As and 3 Bs (and 4 Cs). We continue to sort by length and position in the alphabet. — , Mar 06 '21 at 21:51

Jacob Lee · Answer 1 · 2021-03-06T21:54:30.613

1

Here's my solution:

def snip_string(string, n):
    list_string = list(string)
    list_string.sort()
    chars = set(string)
    for char in chars:
        while list_string.count(char) > n:
            list_string.remove(char)
    return ''.join(list_string)

Calling the function with various values for n gives the following output:

>>> string = "AAAABBBCCCDDD"
>>> snip_string(string, 1)
'ABCD'
>>> snip_string(string, 2)
'AABBCCDD'
>>> snip_string(string, 3)
'AAABBBCCCDDD'
>>>

Edit

Here is the updated version of my solution, which only removes characters if the group of repeated characters exceeds n.

import itertools

def snip_string(string, n):
    groups = [list(g) for k, g in itertools.groupby(string)]
    string_list = []
    for group in groups:
        while len(group) > n:
            del group[-1]
        string_list.extend(group)
    return ''.join(string_list)

Output:

>>> string = "DDDAABBBBCCABCDE"
>>> snip_string(string, 3)
'DDDAABBBCCABCDE'

edited Mar 06 '21 at 21:54

answered Mar 06 '21 at 21:07

Jacob Lee

4,405
2
16
37

Thanks! However, when I try this with something more complicated, like "DDDAABBBBCCABCDE", it returns "DDAABBCCABCDE" (for n = 3), which I don't want...I only want to remove 1 "B" from the "BBBB" and that's it. Your code removes 2 Bs, which I don't want. How would I go about preventing this? – Mar 06 '21 at 21:43
@OA_Elite Okay, I updated my answer and I believe that it gives the desired output. – Jacob Lee Mar 06 '21 at 21:54
I see, thank you so much (I can't upvote yet :(). I'd give this best answer too, but unfortunately, I can't! Both this and the below work, however. Thanks again!!! – Mar 06 '21 at 22:22

score 1 · Answer 2 · answered Mar 06 '21 at 21:17

hello = "hello frrriend"


def replacing() -> str:
    global hello
    j = 0
    for i in hello:
        if j == 0:
            pass
        else:
            if i == prev:
                hello = hello.replace(i, "")
                prev = i
        prev = i
        j += 1
    return hello

replacing()

looks a bit primal but i think it works, thats what i came up with on the go anyways , hope it helps :D

aneroid · Accepted Answer · 2021-03-06T22:38:38.403

Ok, based on your comment, you're either pre-sorting the string or it doesn't need to be sorted by the function you're trying to create. You can do this more easily with itertools.groupby():

import itertools

def max_seq(text, n=1):
    result = []
    for k, g in itertools.groupby(text):
        result.extend(list(g)[:n])
    return ''.join(result)


max_seq('AAABBCCCCDE', 2)
# 'AABBCCDE'
max_seq('EEEEEFFFFGGG', 4)
# 'EEEEFFFFGGG'
max_seq('XXYYZZ')
# 'XYZ'
max_seq('CCCDDDAABC', 2)
# 'CCDDAABC'

In each group g, it's expanded and then sliced until n elements (the [:n] part) so you get each letter at most n times in a row. If the same letter appears elsewhere, it's treated as an independent sequence when counting n in a row.

Edit: Here's a shorter version, which may also perform better for very long strings. And while we're using itertools, this one additionally utilises itertools.chain.from_iterable() to create the flattened list of letters. And since each of these is a generator, it's only evaluated/expanded at the last line:

import itertools

def max_seq(text, n=1):
    sequences = (list(g)[:n] for _, g in itertools.groupby(text))
    letters = itertools.chain.from_iterable(sequences)
    return ''.join(letters)

score 1 · Answer 4 · answered Mar 06 '21 at 22:40

from itertools import groupby
n = 2
def rem(string):
    out = "".join(["".join(list(g)[:n]) for _, g in groupby(string)])
    print(out)

So this is the entire code for your question.

s = "AABBCCDDEEE"
s2 = "AAAABBBDDDDDDD"
s3 = "CCCCAAABBDDABBB"
s4 = "AAAAAAAA"
z = "AAABBCCCCDE"

With following test:

AABBCCDDEE
AABBDD
CCAABBDDABB
AA
AABBCCDE

Python - removing repeated letters in a string

4 Answers4

Edit