1

I was solving the following job interview question and solved most of it but failed at the last requirement.

Q: Build a data structure which supports the following functions:

Init - Initialise Empty DS. O(1) Time complexity.

SetPositiveInDay(d,x) - Add to the DS that in day d exactly x new people were infected with covid-19. O(log n)Time complexity.

WorseBefore(d) - From the days inserted into the DS and smaller than d return the last one which has more newly infected people than d. O(log n)Time complexity.

For example:

Init()
SetPositiveInDay(1,10)
SetPositiveInDay(2,20)
SetPositiveInDay(3,15)
SetPositiveInDay(5,17)
SetPositiveInDay(23,180)
SetPositiveInDay(8,13)
SetPositiveInDay(13,18)
WorstBefore(13) // Returns day #2
SetPositiveInDay(10,19)
WorstBefore(13) // Returns day #10

Important note: you can't suppose that days will be entered by order and can't suppose too that there won't be "gaps" between days. (Some days may not be saved in the DS while those after it may be).


What I did?

I used AVL tree (I could use 2-3 tree too). For each node I have:

Sick - Number of new infected people in that day.

maxLeftSick - Max number of infected people for left son.

maxRightSick - Max number of infected people for right son.

When inserted a new node I made sure that in rotation data won't get missed plus, for each single node from the new one till the root I did:

enter image description here

enter image description here

But I wasn't successful implementing WorseBefore(d).

  • please leave any ideas on how to improve my question, I saved maxLeftSick and maxRightSick because I believe this is the key to solve the third requirement (maybe I am wrong?) –  Dec 19 '20 at 09:33
  • Is my question clear, I don't know why I don't get any comments on this one –  Dec 19 '20 at 19:26
  • Because this is a hard one. – Junmin Hao Dec 20 '20 at 03:36
  • Did the interviewer tell you the to be time complexities or you added it on your own to this post? – Shridhar R Kulkarni Dec 20 '20 at 03:38
  • @ShridharRKulkarni he told me that –  Dec 20 '20 at 10:20
  • @MrCalc I'm curious: did you encounter this question in a real interview or did you find it in some collection (if so, which one?)? Was the hint about `maxLeftSick` and `maxRightSick` given? Without the hint, this seems way to hard to solve in the heat of an interview. – Mo B. Dec 20 '20 at 11:55
  • @MoB. real 2 hour interview, I wasn't given a hint but I thought of using maxLeftSick and maxRightSick on my own (was unsure if this is true at all) –  Dec 20 '20 at 12:35

2 Answers2

1

Where to search?

First you need to find the node node corresponding to d in the tree ordered by days. Let x = Sick(node). This can be done in O(log n).

If maxLeftSick(node) > x, the solution must be in the left subtree of node. Search for the solution there and return the answer. This can be done in O(log n) - see below.

Otherwise, traverse the tree upwards towards the root, starting from node, until you find the first node nextPredecessor satisfying this property (this takes O(log n)):

  • nextPredecessor is smaller than node,
  • and either
    1. Sick(nextPredecessor) > x or
    2. maxLeftSick(nextPredecessor) > x.

If no such node exists, we give up. In case 1, just return nextPredecessor since that is the best solution.

In case 2, we know that the solution must be in the left subtree of nextPredecessor, so search there and return the answer. Again, this takes O(log n) - see below.


Note that there is no need to search in the right subtree of nextPredecessor since the only nodes that are smaller than node in that subtree would be the left subtree of node itself, and we have already excluded that.

Note also that it is not necessary to traverse further up the tree than nextPredecessor since those nodes are even smaller, and we are looking for the largest node satisfying all constraints.


How to search?

OK, so how do we search for the solution in a subtree? Finding the largest day within a subtree rooted in q that is worse than an infection number x is simple using the maxLeftSick and maxRightSick information:

  1. If q has a right child and maxRightSick(q) > x then search in the right subtree of q.
  2. If q has no right child and Sick(q) > x, return Day(q).
  3. If q has a left child and maxLeftSick(q) > x then search in the left subtree of q.
  4. Otherwise there is no solution within the subtree q.

We are effectively using maxLeftSick and maxRightSick to prune the search tree to include only "worse" nodes, and within that pruned tree we get the right most node, i.e. the one with the largest day.

It is easy to see that this algorithm runs in O(log n) where n is the total number of nodes since the number of steps is bounded by the height of the tree.

Pseudocode

Here is the pseudocode (assuming maxLeftSick and maxRightSick return -1 if no corresponding child node exists):


// Returns the largest day smaller than d such that its 
// infection number is larger than the infection number on day d.
// Returns -1 if no such day exists.
int WorstBefore(int d) {
    node = find(d);
    
    // try to find the solution in the left subtree
    if (maxLeftSick(node) > Sick(node)) {
        return FindLastWorseThan(node -> left, Sick(node));
    }
    // move up towards root until we find the first node
    // that is smaller than `node` and such that
    // Sick(nextPredecessor) > Sick(node) or 
    // maxLeftSick(nextPredecessor) > Sick(node).
    nextPredecessor = findNextPredecessor(node);
    if (nextPredecessor == null) return -1;

    // Case 1
    if (Sick(nextPredecessor) > Sick(node)) return nextPredecessor;
    
    // Case 2: maxLeftSick(nextPredecessor) > Sick(node)
    return FindLastWorseThan(nextPredecessor -> left, Sick(node));
}

// Finds the latest day within the given subtree with root "node" where
// the infection number is larger than x. Runs in O(log(size(q)).
int FindLastWorseThan(Node q, int x) {
    if ((q -> right) = null and Sick(q) > x) return Day(q);
    if (maxRightSick(q) > x) return FindLastWorseThan(q -> right, x);
    if (maxLeftSick(q) > x) return FindLastWorseThan(q -> left, x);
    return -1;
}
Mo B.
  • 5,307
  • 3
  • 25
  • 42
  • Comments are not for extended discussion; this conversation has been [moved to chat](https://chat.stackoverflow.com/rooms/226220/discussion-on-answer-by-mo-b-job-interview-question-using-trees-what-data-to-s). – Samuel Liew Dec 21 '20 at 01:49
  • @MrCalc why do you think so? – Mo B. Dec 25 '20 at 08:36
  • your solution won't work, consider the following avl tree: https://i.ibb.co/f0kB7gf/Screen-Shot-2020-12-25-at-11-46-12-AM.png if you start from 18 you stop in 16 in case it has more sick people and never visit other nodes/ –  Dec 25 '20 at 09:45
  • @MrCalc it does work. Add sick numbers to the relevant nodes and I'll show you. But please stop posting comments here. Instead, continue the discussion here: https://chat.stackoverflow.com/rooms/226220/discussion-on-answer-by-mo-b-job-interview-question-using-trees-what-data-to-s – Mo B. Dec 25 '20 at 09:52
1

First of all, your chosen data structure looks fine to me. You did not mention it explicitly, but I assume that the "key" you use in the AVL tree is the day number, i.e. an in-order traversal of the tree would list the nodes in their chronological order.

I would just suggest a cosmetic change: store the maximum value of sick in the node itself, so that you don't have two similar informations (maxLeftSick and maxRightSick) stored in one node instance, but move those two informations to the child nodes, so that your node.maxLeftSick is actually stored in node.left.maxSick, and similarly node.maxRightSick is stored in node.right.maxSick. This is of course not done when that child does not exist, but then we don't need that information either. In your structure maxLeftSick would be 0 when left is not defined. In my proposed structure, you would not have that value -- the 0 would follow naturally from the fact that there is no left child. In my proposal, the root node would have an information in maxSick which is not present in yours, and which would be the sum of your root.maxLeftSick and root.maxRightSick. This information would not really be used, but it is just there to make the structure consistent throughout the tree.

So you would just store one maxSick, which considers the current node's sick value also in that maximum. The processing you do during rotations will need to change accordingly, but will not become more complex.

I will assume that your AVL tree is single-threaded, i.e. you don't keep track of parent-pointers. So create a find method which will return the path to the node to be found. For instance, in Python syntax, it could look like this:

def find(self, day):
    node = self.root
    path = []  # an array of nodes
    while node:
        path.append(node)
        if node.day == day:  # bingo
            return path
        if day < node.day:
            node = node.left
        else:
            node = node.right

Then the worstBefore method could look like this:

def worstBefore(self, day):
    path = self.find(day)
    if not path:
        return  # day not found
    # get number of sick people on that day:
    sick = path[-1].sick
    # look for recent day with greater number of sick
    while path:
        node = path.pop()  # walk upward, starting with found node
        if node.day < day and node.sick > sick:
            return node.day
        if node.left and node.left.maxSick > sick:
            # we will find the result in this subtree
            node = node.left
            while True:
                if node.right and node.right.maxSick > sick:
                    node = node.right
                elif node.sick > sick:  # bingo
                    return node.day
                else:
                    node = node.left

So the path returned by the find method will be used to get the parents of a node when you need to backtrack upwards in the tree along that path.

If along that path you find a left child whose maxSick is greater, then you know that the targeted node must be in that subtree. It is then a matter to walk down that subtree in a controlled way, choosing the right child when it still has maxSick greater. Otherwise check the current node's sick value and return that one if that value is greater. Otherwise go left, and repeat.

While there is no such left sub tree, go up along the path. If that parent would be a match, then return it (make sure to verify the day number). Keep checking for left sub trees that have a larger maxSick.

This runs in O(logn) because you first will walk zero or more steps upward and then zero or more steps downward (in a left subtree).

You can see your example scenario run on repl.it. There I focussed on this question, and didn't implement the rotations.

trincot
  • 317,000
  • 35
  • 244
  • 286
  • Hi, maxSick isn't enough alone you should use both max left and max right as I suggested (They aren't the same as you claimed) –  Dec 25 '20 at 09:19
  • Did you find a problem with the implementation? Can you provide an example for which the output is wrong? NB/ I am not claiming they are the same, just that you could store that information one level down in the tree, compared to how you did it. – trincot Dec 25 '20 at 09:55
  • I added a bit more explanation of the idea to use `maxSick` instead of `maxLeftSick` and `maxRightSick`. Let me know if this clarifies it, or whether you still see a problem. – trincot Dec 25 '20 at 12:46