1

Before I carry on to the problem, I should note that I know there are much easier ways to solve this problem without using divide and conquer; however, the point of solving this problem under this restriction is that I actually want to learn how to tackle problems with divide and conquer. I am good at recognizing correct solutions, but implementing my own D&C strategy is not a skill I currently have.

The problem is this: given a string, find the longest substring that does not contain the letter 'y'. For example, longestNoY("abydefyhi") should return "def".

My first approach to tackle this problem was to determine the base cases. If we had a string of length 2, we would want to return the non-y components (or empty string if both characters were 'y'). If we had a string of length 1, we would return it if it is not a 'y'.

So the first part should look like this:

def longestNoY(string, start, end):
     #Conquer
     if start == end:
          if string == 'y': return ''
          return string
     if start + 1 == end:
          if string == "yy": return ''
          if string[0] == 'y': return string[1]
          return string[0]
     ....

Next, I knew that I would need to recursively call the function for each half of the parent string. I also knew that I wanted the function to return the longer of the two children, except if the sum of the lengths of the two children was equal to the length of the parent, then the function should return the parent because there were no 'y's in the children.

    #Divide and Partial Implementation of Rejoin
    ....
    middle = (start + end) // 2
    leftString = longestNoY(string, start, middle)
    rightString = longestNoY(string, middle, end)
    if len(leftString) + len(rightString) == len(string): return string
    ....

The part I am having trouble with now would best be explained with an example:

0 1 2     3 4   5 6   7 8 
a b   y   d e | f y   h i
a b   y | d e | f y | h i
a b | y | d e | f y | h i 

The longest substring in the left side is either "ab" or "de", but we know that "de" is adjacent to an 'f' which would make "def" the longest. I don't know exactly how to carry on with this problem. Please do not give me a program to solve this problem.

DrJessop
  • 462
  • 6
  • 26

4 Answers4

2

It is possible. But then you each time need to return four values: the longest subsequence that starts at the left end of the "slice" (this can be zero), the longest subsequence "in the middle", the longest subsequence that ends at the right end of the "slice" (this can be zero as well), and if the string is just a sequence of non-Y characters (a boolean). The fourth element can in fact just be derived by checking if one of the elements in the first three is equal to the length, but this is probably easier to implement.

Why is this important? Because a sequence of non-ys can pass "through" a divide. For example:

abcdeYfghi   jklYmnopqr

here if we split it in the middle (or any other way that is not "constant" and "rest").

So here recursively we have several cases:

  1. the empty string returns (0, 0, 0, True),
  2. the non-empty string other than Y, we return (1, 1, 1, True);
  3. for the singleton string Y we return (0, 0, 0, False);
  4. the recursive case that divides the string in two, and applies "merge" logic afterwards on the results.

The "merge logic" is rather complex, especially since it is possible that both "subslices" only contain non-Y strings. After slicing we thus obtain two triples (a0, a1, a2, a3) and (b0, b1, b2, b3), and we produce a 3-tuple (c0, c1, c2, c3).

If a3 = True and b3 = True, then of course that means that the current slice contains no Y's as well. So we can derive that:

c3 = a3 and b3

given a3 holds, then it holds that c0 = a0 + b0 since then a0 has no Y's and thus the left "sequence" is the same as the entire length of the subsequence plus the left subsequence of the right part. If a3 does not hold, c0 is just a0.

Given b3 holds, then it holds that c2 = a2 + b2 for the same reasoning as the one above, if not, then a2 = b2.

Now the element in the middle is the maximum of three elements:

  1. the element in the middle of the left slice a1;
  2. the element in the middle of the right slice b1; and
  3. the sum of a2 and b0 since there can be overlap and then this is the sum of the two.

We thus return the maximum of the tree.

So in Python, this looks like:

def longestNoY(string, start, end):
    if start == end:
        return (0, 0, 0, True)
    elif start+1 == end:
        if string[start] == 'y':
            return (0, 0, 0, False)
        else:
            return (1, 1, 1, True)
    else:
        mid = (start + end)//2
        a0, a1, a2, a3 = longestNoY(string, start, mid)
        b0, b1, b2, b3 = longestNoY(string, mid, end)
        c3 = a3 and b3
        c0 = a0 + a3 * b0
        c2 = b2 + b3 * a2
        c1 = max(a1, b1, a2 + b0)
        return (c0, c1, c2, c3)

The final result is the maximum of the first three items in the tuple.

For the given sample string, we thus obtain:

             (1, 1, 1, True) a
             (1, 1, 1, True) b
         (2, 2, 2, True) ab
             (0, 0, 0, False) y
             (1, 1, 1, True) d
         (0, 1, 1, False) yd
     (2, 2, 1, False) abyd
             (1, 1, 1, True) e
             (1, 1, 1, True) f
         (2, 2, 2, True) ef
             (0, 0, 0, False) y
                 (1, 1, 1, True) h
                 (1, 1, 1, True) i
             (2, 2, 2, True) hi
         (0, 2, 2, False) yhi
     (2, 2, 2, False) efyhi
 (2, 3, 2, False) abydefyhi
(2, 3, 2, False)

but that being said, it looks to me as an unnecessary complicated procedure to construct something that, in terms of time complexity, is the same as traversal, but typically more expensive (function calls, constructing new objects, etc.). Especially since linear traversal is just:

def longestNoY(string):
    mx = 0
    cur = 0
    for c in string:
        if c == 'y':
            mx = max(mx, cur)
            cur = 0
        else:
            cur += 1
    return mx

There is however an advantage here is that the above described algorithm can be used for parallelization. If for example the string is huge, the above can be used such that every core can count this. In that case it is however likely beneficial to use an iterative level on the "core" level, and only use the above to "distribute" work and "collect" results.

Willem Van Onsem
  • 443,496
  • 30
  • 428
  • 555
  • Not sure what these statements `c3 = a3 and b3 c0 = a0 + a3 * b0 c2 = b2 + b3 * a2` are actually accomplishing. I cannot wrap my head around these somehow :) – SomeDude Nov 07 '18 at 21:05
  • @SomeDude: a `bool` is just `1` (`True`) or `0` (`False`). You can thus perform `a0 + a3 * b0` to express `a0` for `a3 == False`, and `a0 + b0` for `a3 == True`. Furthermore in Python, one uses `and` and `or` for boolean logic (and actually these are a bit more "generic" than their Java equivalents), since for example `[] or [1]` returns `[1]` :) – Willem Van Onsem Nov 07 '18 at 21:07
2

This can be easily solved by just traversing through the string. But I know you want to learn Divide Conquer.

To me, this is not a good problem to solve using Divide Conquer.

What @WillemVanOnsem suggested by recursion has essentially the same effect as when you traverse linearly.

But if you do want to do it in Divide & Conquer fashion, you need to consider the substring that crosses the mid point i.e. start <= i <= mid < j <= end - but that would be overkill.

SomeDude
  • 13,876
  • 5
  • 21
  • 44
  • 1
    You beat me to it. Sometimes the most important thing to learn about a tool is when ***not*** to use it. This is definitely a problem where divide-and-conquer is not the right tool. – user3386109 Nov 07 '18 at 20:50
  • I already know I should not be using this method to solve the problem. Of course anybody who has been programming for a few days could just use a straightforward greedy algorithm, but I am just trying to learn the divide and conquer paradigm. – DrJessop Nov 07 '18 at 20:58
  • @DrJessop I know you want to learn Divide Conquer, but this problem doesn't seem to offer a good learning experience. For example try solving [maximum sum sub array problem](https://en.wikipedia.org/wiki/Maximum_subarray_problem) using Divide Conquer which is actually a bit similar to your problem but is not in that it does help in evaluating the left, right, cross pieces separately. – SomeDude Nov 07 '18 at 21:01
  • @SomeDude To be honest, this was a practice midterm question, and the professor is famous for giving clunky problems where you are forced to make an algorithm under the restriction that it has to be from a particular family, even if that family is not suited for that particular problem. Thanks for your link, though. – DrJessop Nov 07 '18 at 21:04
  • @DrJessop: actuallly this *can* be useful, since it is related to *parallel* programming. For example a sum over gigantic arrays can be calculated efficiently with a d&c algorithm. – Willem Van Onsem Nov 07 '18 at 21:20
1

I think the best way to put the problem is to find the positions just before and just after y, not being y. This way you will find left and right ends of intervals. I do not give you the code, since you specifically asked as not to solve the problem for you, just point to the right direction, so:

  • In trivial cases (length of interval is 0) determine whether the item you have is a valid left end or right and of an interval
  • In non-trivial cases always halve the set to left and right (no problem if the number of items is odd, just put the middle somewhere) and issue the divide and conquer for them as well
  • In non-trivial cases always consider the best interval the left and right sub-problem gives you
  • In non-trivial cases make sure that if an interval happens to start in the left and end in the right, you take that into account
  • from such intervals, the one which has a greater length is better

These are the ideas you need to employ in order to implement the divide and conquer you desire. Happy coding!

Lajos Arpad
  • 64,414
  • 37
  • 100
  • 175
0

Now that I actually had time to study, I decided to come back to this problem and came up with a very readable solution. Here it is:

def across(string, start, end, middle):
    startL = middle
    bestL = ''
    while(startL >= start and string[startL] != 'y'):
        bestL = string[startL] + bestL
        startL -= 1
    startR = middle + 1
    bestR = ''
    while(startR <= end and string[startR] != 'y'):
        bestR = bestR + string[startR]
        startR += 1
    return bestL + bestR

def longestNoY(string, start, end):
    if(start > end):
        return ''
    if(start == end):
        if(string[start] == 'y'):
            return ''
        return string[start]
    middle = (start + end) // 2
    leftString = longestNoY(string, start, middle)
    rightString = longestNoY(string, middle + 1, end)
    acrossString = across(string, start, end, middle)
    return max(leftString, rightString, acrossString, key=len)
DrJessop
  • 462
  • 6
  • 26