Is there any way to efficiently reconstruct a collection based on a sequence of inserts/removals?

Question

Note: the below code happens to be C#, but really an answer in any language would be helpful for me.

Suppose rather than an actual collection (e.g., a List<T>), I have a sequence of operations, each looking something like this:

struct ListOperation<T>
{
    public enum OperationType { Insert, Remove }

    public OperationType Type;
    public T Element; // irrelevant for OperationType.Remove
    public int Index;
}

Is there some way to efficiently "reconstruct" a collection based on a sequence of such operations?

In particular, I'm looking to avoid the obvious (inefficient) implementation of basically just creating a List<T> and calling Insert and RemoveAt—both O(N) operations—for every element.

Update: Let's say the "sequence" of operations is in fact a concrete collection whose count is known and which is randomly accessible by index (so, like a ListOperation<T>[], for example). Let's also say the actual count of the resulting collection is already known (but really, that would be trivial to figure out in O(N) anyway, by counting insertions and removals). Any other ideas?

Is it possible to read from end to front or only from front to end? — Rune FS, Feb 08 '11 at 07:29
And btw inserting or removing might be O(1) it's the worst case that's O(n) but I guess you knew — Rune FS, Feb 08 '11 at 07:33
This is an *awesome* question. I'm out of +votes for today, but I'll be sure to upvote it as soon as tomorrow rolls around. — templatetypedef, Feb 08 '11 at 07:48
Interesting. The main difficulty stems from the fact that the `Index` is relative to the current state of the list at the point the operation (Insert/Remove) was created. — Matthieu M., Feb 08 '11 at 09:43
@Rune FS: The operations themselves will be readable in either direction. And yeah, I am aware that inserting/removing *could* be fast; it's the O(n) worst case you mentioned that I'm trying to avoid. @templatetypedef: Thanks ;) I came upon this challenge while working on a project yesterday and personally found it really interesting—figured it'd be a good idea to bring it to SO! And @Matthieu M.: Yes, that does seem to be the big complication. I'm hoping there is a clever way to address it, however... anyway, it looks like I've already got a couple of ideas to play around with. — Dan Tao, Feb 08 '11 at 13:32
Related: http://stackoverflow.com/questions/3071497/list-or-container-o1-ish-insertion-deletion-performance-with-array-semantics/ — , Feb 08 '11 at 16:32
@Dan Tao, Sorry my mistake, I think you just want add, and after a day I see no comment(on non of the answers which are here), I think you are not following the question. — Saeed Amiri, Feb 09 '11 at 13:38

score 6 · Answer 1 · answered Feb 08 '11 at 07:44

6

I think you can do this in O(n lg n) by using an indexed balanced binary tree (a binary tree where each node stores the number of nodes to its left and right). With this structure, you can get worst-case O(lg n) insertion or deletion at any point by walking the tree to find the position at which the new element belongs, then doing whatever fixup is necessary to maintain the balance condition (for example, if it's a red-black tree, you'd do a red-black tree fixup).

Given this setup, you could replay all the operations into a tree structure like this in O(n lg n) because each individual operation takes at most O(lg n) to complete. Once you have the tree, you can then do an inorder traversal of the elements to get them back in the proper order, and can append all the values to a resulting list in O(n) time, for a net of O(n lg n).

I'm going to think about this more and see if I can come up with a way of doing this in linear time. In the meantime, this at least shows that it's possible to do this in subquadratic time.

answered Feb 08 '11 at 07:44

templatetypedef

362,284
104
897
1,065

Don't think about linear time for all items, if you can do this, you can insert them in linear time, and in removal of each item you can output it (print it) and for remove use removeat(1), removeat(2), ... So you will output sorted list in linear time but it's impossible (for comparison base methods and your insert should have some sort of comparison to fill balanced binary tree). – Saeed Amiri Feb 08 '11 at 08:01
@Saeed- I don't follow what you're saying. This doesn't break the O(n lg n) barrier for sorting, and it's not trying to sort anything. Can you clarify your concern? – templatetypedef Feb 08 '11 at 08:11
Would you say how do you want insert items in O(log(N)) and how do you want balance tree in O(log(n)) without comparison methods? if you using comparison methods, you can output your items in sorted order, and if you have some O(x) way for insertion or deletion and balancing of tree (which is sum of O(x) for n item is O(n)) you can sort them by removal of items in O(n), I didn't saying you Sort them, but I'm saying construction of balanced tree needs comparison method and any method you used can be used in sorting. – Saeed Amiri Feb 08 '11 at 08:17
@Saeed- Ah, I see what you're saying. You don't actually need a comparison to balance the tree; if you think about it, any tree can be a red/black tree even if the nodes themselves aren't in any particular sorted order. The idea is to take the *shape* of a red/black tree and use it to store a sequence of (not sorted) values in order by storing them in such a way that an inorder traversal visits them in the desired order. CLRS has a discussion of this. I'm not planning on getting this system to work in O(n) time, by the way... I'd expect that would require something new. – templatetypedef Feb 08 '11 at 08:37
@templatetypedef: I'd prefer a modified SkipList at this point, this alleviates the reorganization issue by going the randomized road. – Matthieu M. Feb 08 '11 at 09:49
1

There is a name for this structure: Order Statistics Tree. – Feb 08 '11 at 16:36

score 1 · Answer 2 · answered Feb 21 '11 at 12:04

I have a hunch that there might be an O(n) algorithm.

Step 1:

Radix-sort digitally on the index. Takes O(n) time. This is a stable sort if done from the LSB side.

Step 2:

Let say there are operations with index i but no operations with a smaller index that has not been done. We can then replay operations at index i in the correct order. Specifically what the operations 'insert' and 'remove' are doing is not clear to me. Worst case is O(n lg n) with the ideas of a binary tree, but maybe the replaying can be done in O(n) because it is local.

Step 3:

Lift step 2 to an inductive argument as a proof of correctness. After the steps at index i there is an invariant to be maintained and a shorter list of operations, so by induction, ... (details) ...

Is there any way to efficiently reconstruct a collection based on a sequence of inserts/removals?

2 Answers2

Step 1:

Step 2:

Step 3: