4

I would like to remove all characters from a first string s1 exactly the number of times they appear in another string s2, i.e. if s1 = "AAABBBCCCCCCD" and s2 = "ABBCCC" then the result should be s = "AABCCCD". (The order of the characters in the resulting string is actually irrelevant but it's a plus if it can be preserved.)

The following rather crude code can do this:

def reduce_string(s1, s2):
    s = s1
    for c in s2:
        if c in s:
            s = s.replace(c, "", 1)
    return(s)

# examples
reduce_string("AAABBBCCCCCCD", "ABBCCC")
reduce_string("AAABBBCCCCCCD", "ABBCCCE")

My question is, can the same be achieved by clever use of some built-in function or at least in a more elegant way? Thank you for all your answers!

schotti
  • 155
  • 5
  • 1
    I will gladly try and answer the "why question": The context is randomisation in clinical studies. String s1 = "AAABBBCCCCCC", for example, represents 12 individuals who are to be allocated to 3 study arms A, B, and C, with the control arm C being twice as large as the two intervention arms A and B. The characters in s1 will be randomly permuted to generate possible allocation schemes. Now, s2 = "ABBCCC" stands for a batch of participants that were allocated before. Instead of permuting all participants' allocations I would then like to only permute those of the not yet allocated individuals. – schotti Nov 28 '21 at 22:31
  • Does this answer your question? [Testing whether a string has repeated characters](https://stackoverflow.com/questions/32090058/testing-whether-a-string-has-repeated-characters) – bad_coder Nov 29 '21 at 00:40
  • @bad_coder: No, it doesn't, and to be honest, I don't understand why you thought it might. What exactly is not clear about my question? – schotti Nov 29 '21 at 11:33

2 Answers2

6

You can use counter objects. Subtract one against the other and join the remaining elements together.

from collections import Counter

s1 = "AAABBBCCCCCCD"
s2 = "ABBCCC"

counter = Counter(s1)
counter.subtract(Counter(s2))

result = ''.join(counter.elements())
print(result)
AABCCCD

As a one-liner:

print(''.join((Counter(s1) - Counter(s2)).elements()))
flakes
  • 21,558
  • 8
  • 41
  • 88
  • This will change the order, so if you have `s1 = "AAABBDCCCCCCD"`, the output will be `AACCCDD ` – Levi Nov 28 '21 at 22:32
  • 1
    @Levi if OP updates the question, I'll edit to satisfy that. For OP, Basically you would only put the second string in a counter, loop the original string and decrement the counter on each letter that passes. – flakes Nov 28 '21 at 22:35
  • 3
    Thank you all, Loïc, flakes and Levi! I added a little context that should also clarify that the order does not matter for my particular purpose. – schotti Nov 28 '21 at 22:37
  • @flakes right, OP clarified it in the meantime – Levi Nov 28 '21 at 22:46
1

There is a filterfalse function in itertools module that you should see. Consult the documentation here.

The filterfalse function returns elements from a iterable when the predicate is evaluated as False.

So, one possible solution might be:

import itertools

def reduce_string(s1, s2):

    def predicate(letter, param=list(s2)):
        if letter in param:
            param.remove(letter)
            return True
        return False

    result = itertools.filterfalse(predicate, s1)

    return ''.join(result)

reduce_string("AAABBBCCCCCCD", "ABBCCC")
reduce_string("AAABBBCCCCCCD", "ABBCCCE")

However, notice how my predicate function is a little trickier when it changes the second string.

The keyword argument param is evaluated when the function predicate is created inside the reduce_string scope as a list object.

Since the reference does not change but the elements inside param, I was able to change the second string for comparisons reasons.

Now, the question remains: is there a more elegant way to define predicate function?