1

I am trying to make all possible substitutions between a reference and a test sequence. The sequences will always be the same length and the goal is to substitute Test characters with those of Ref.

Ref= "AAAAAAAAA"
Test="AAATAATTA"

Desired output:

AAATAATTA, AAATAAAAA,  AAATAATAA,  AAATAATTA,  AAAAAATTA,  AAAAAATAA,  AAAAAAATA
David Buck
  • 3,752
  • 35
  • 31
  • 35
Patrickc01
  • 145
  • 1
  • 1
  • 6
  • Does this mean that the i-th character in the new_string can be either `Ref[i]` or `Tests[i]`, for each i in the length of the string? Thus creating `2^len -1` possible new_strings, where `len = len(Ref)`? However, you only want these substitutions when Ref[i] ~= Test[i]? – DarrylG Apr 28 '20 at 15:11

3 Answers3

3

You can use itertools.product for this if you zip the two strings together (turning them into a set of 2-tuples for product to find combinations of). You then probably want to uniquify them in a set. All together it looks like this:

>>> {''.join(t) for t in product(*zip(Ref, Test))}
{'AAAAAAAAA', 'AAAAAATAA', 'AAAAAAATA', 'AAATAATTA', 'AAATAATAA', 'AAATAAAAA', 'AAATAAATA', 'AAAAAATTA'}

To break that down a little further, since it looks a bit like line noise if you aren't familiar with the functions in question...

Here's the zip that turns our two strings into an iteration of pairs (wrapping it in a list comprehension for easy printing, but we'll remove that in the next stage):

>>> [t for t in zip(Ref, Test)]
[('A', 'A'), ('A', 'A'), ('A', 'A'), ('A', 'T'), ('A', 'A'), ('A', 'A'), ('A', 'T'), ('A', 'T'), ('A', 'A')]

The product function takes an arbitrary number of iterables as arguments; we want to feed it all of our 2-tuples as separate arguments using *:

>>> [t for t in product(*zip(Ref, Test))]
[('A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A'), ('A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A'), ... (a whole lot of tuples)

Use join to turn those tuples back into strings:

>> [''.join(t) for t in product(*zip(Ref, Test))]
['AAAAAAAAA', 'AAAAAAAAA', 'AAAAAAATA', 'AAAAAAATA', ... (still a whole lot of strings)

And by making this a set comprehension ({}) instead of a list comprehension ([]), we get just the unique elements.

Samwise
  • 68,105
  • 3
  • 30
  • 44
0

If you want to avoid using itertools (as .product will make more copies of equal strings in your case), You can use recursion and generators and implement your own solution. My inclination is that this should be more performant for that reason if these sequences are very large. However, if not then the itertools solution is better.

def take_some(to: str, from_: str):
     assert len(to) == len(from_)  # your precondition
     if to == from_:  # no-more left to check ('' == '') in worst case
         yield from_
         return
     for i, (l, r) in enumerate(zip(to, from_)):
          if l != r:
               # do not take the character
               rest = take_some(to[i+1:], from_[i+1:])
               for res in rest:
                   yield to[:i+1] + res
                   yield to[:i] + r + res
               return

Giving

In [2]: list(take_some("AAAAAAAAA", "AAATAATTA"))                                     
['AAAAAAAAA',
 'AAATAAAAA',
 'AAAAAATAA',
 'AAATAATAA',
 'AAAAAAATA',
 'AAATAAATA',
 'AAAAAATTA',
 'AAATAATTA']

Note that this does contain the original Ref string, you can delete it out of the result at the end if you really mean not to contain it.

modesitt
  • 7,052
  • 2
  • 34
  • 64
0

itertools.combinations can be used to generate the position combinations, you can control the tuple size at the second argument of itertools.combinations

import itertools

REF = "AAAAAAAAA"
poses =(3,6,7)
for i in range(1, len(poses) + 1):
    tmp = itertools.combinations(poses, i)
    for j in tmp:
        result = REF
        print(j)
        for k in j:
            result = result[:k]+'T' + result[k+1:]
        print(result)

Result:

(3,)
AAATAAAAA
(6,)
AAAAAATAA
(7,)
AAAAAAATA
(3, 6)
AAATAATAA
(3, 7)
AAATAAATA
(6, 7)
AAAAAATTA
(3, 6, 7)
AAATAATTA
Boying
  • 1,404
  • 13
  • 20