2

I have a problem in understanding the node structure for the calculation of the complete longest increasing subsequence (lis) in the paper "Heaviest Increasing/Common Subsequence Problems" by Jacobson and Vo.

Here is the pseudo code from the paper:

Algorithm[1]

What is meant with

node is an auxiliary array that, for each element in L, contains a record of an element that precedes this element in an increasing subsequence. The function newnode() constructs such records and links them into a directed graph. At the end of the algorithm, we can search from the maximal element of L to recover an LIS of sigma.

? How would you implement this structure?

Do I have to construct a directed graph with all the elements of the sequence as vertices (plus a nil-vertex) and edges "\sigma_i -> s" and then search for the longest path starting at the maximal element of L (and ending at nil)? Isn't there a more effective way to get the complete lis?


My second question: Is this algorithm as fast as the algorithm described in wikipedia? If not: Can I modify the algorithm from wikipedia as well to calculate heaviest common subsequences as described in the paper?

zw324
  • 26,764
  • 16
  • 85
  • 118

1 Answers1

0

I'd implement the array with an array and the graph as a singly-linked list with structure sharing. To wit, in generic C++,

#include <algorithm>
#include <iostream>
#include <memory>
#include <utility>
#include <vector>

template <typename T>
struct Node {
  explicit Node(const T& e) : element(e) {}

  T element;
  std::size_t index = 0;
  Node* previous = nullptr;
};

template <typename T, typename Compare>
std::vector<T> LongestIncreasingSubsequence(const std::vector<T>& elements,
                                            Compare compare) {
  if (elements.empty()) {
    return {};
  }
  std::vector<std::unique_ptr<Node<T>>> node_ownership;
  node_ownership.reserve(elements.size());
  std::vector<Node<T>*> tableau;
  for (const T& element : elements) {
    auto node = std::make_unique<Node<T>>(element);
    auto it = std::lower_bound(tableau.begin(), tableau.end(), node.get(),
                               [&](const Node<T>* a, const Node<T>* b) {
                                 return compare(a->element, b->element);
                               });
    if (it != tableau.begin()) {
      auto previous = it[-1];
      node->index = previous->index + 1;
      node->previous = previous;
    }
    if (it != tableau.end()) {
      *it = node.get();
    } else {
      tableau.push_back(node.get());
    }
    node_ownership.push_back(std::move(node));
  }
  Node<T>* longest = *std::max_element(
      tableau.begin(), tableau.end(),
      [](Node<T>* a, Node<T>* b) { return a->index < b->index; });
  std::vector<T> result(longest->index + 1);
  for (; longest != nullptr; longest = longest->previous) {
    result.at(longest->index) = longest->element;
  }
  return result;
}

int main() {
  for (int x : LongestIncreasingSubsequence(
           std::vector<int>{3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5, 8, 9, 7, 9, 3},
           std::less<int>())) {
    std::cout << x << '\n';
  }
}

If you're fortunate enough to be working in a language with garbage collection, you can ignore the business with node_ownership and std::move.

Here's a Python version.

import bisect


def longest_increasing_subsequence(elements):
    elements = list(elements)
    if not elements:
        return []
    # Build the tableau
    tableau_elements = []
    tableau_indexes = []
    predecessors = []
    for i, element in enumerate(elements):
        j = bisect.bisect_left(tableau_elements, element)
        predecessors.append(tableau_indexes[j - 1] if j > 0 else None)
        if j < len(tableau_elements):
            tableau_elements[j] = element
            tableau_indexes[j] = i
        else:
            tableau_elements.append(element)
            tableau_indexes.append(i)
    # Find the subsequence lengths
    lengths = []
    for i, predecessor in enumerate(predecessors):
        lengths.append(1 if predecessor is None else lengths[predecessor] + 1)
    # Extract the best subsequence
    best = max(range(len(lengths)), key=lambda i: lengths[i])
    subsequence = []
    while best is not None:
        subsequence.append(elements[best])
        best = predecessors[best]
    subsequence.reverse()
    return subsequence


print(longest_increasing_subsequence([3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5, 8, 9, 7, 9, 3]))
David Eisenstat
  • 64,237
  • 7
  • 60
  • 120
  • Thanks! I never used C++. Therefore I have some questions: a) Why `*it = node.get()`? The `it` isn't used after this command. b) Where is the `delete(L,t)` and `insert(L,sigma_i)` in your code? c) I don't see the _auxiliary_ array. I guess your algorithm differs a bit from the one in the paper. Am I right? – Popov Florino Sep 12 '19 at 07:27
  • a) and b) have the same answer: `*it = node.get()` overwrites the existing entry with the new node (simulating delete followed by insert without moving all of the array elements). c) The auxiliary array is contained in the `Node` structures. You could use integer indexes instead of pointers, but pointers were more convenient. – David Eisenstat Sep 12 '19 at 12:04
  • @PopovFlorino I added a Python version that hews a little more closely to the described algorithm. – David Eisenstat Sep 12 '19 at 12:41
  • Ah, no I understand your c++ code: pointer dereferencing and `tableau` is an array of pointers. – Popov Florino Sep 12 '19 at 13:05
  • And thanks for the Python code! But I still don't get why in the paper the auxiliary `node` array is indexed with the element values sigma_i. Is it just inaccuracy? – Popov Florino Sep 12 '19 at 13:09
  • @PopovFlorino It works if the elements are distinct, I guess. A lot of pseudocode in papers isn't all that good, frankly. – David Eisenstat Sep 12 '19 at 13:21
  • And what do you think about my second question? I guess this algorithm has the same complexity as the one in wikipedia since `bisect` and `std::lower_bound` are O(log n)... – Popov Florino Sep 12 '19 at 13:50
  • @PopovFlorino Yes, it's O(n log n). If your keys are integers, you could drop that a bit by using a trie or something in the van Emde Boas tree family. – David Eisenstat Sep 12 '19 at 14:05
  • Ok one last question :) I want to implement the hcs-algorithm (fig 7) on page 15 (65) of the paper. (It is based on the lis-algorithm.) Which way of implementing the auxiliary array would you recommend: including it in the node structure or with additional arrays? (Note that in the hcs-algorithm more than one deletion in the `tableau`-array is possible.) – Popov Florino Sep 12 '19 at 14:26
  • I meant “including it in the tableau-structure”. – Popov Florino Sep 12 '19 at 15:16
  • @PopovFlorino With multiple deletions possible, you probably want to switch to a binary search tree. – David Eisenstat Sep 12 '19 at 18:15