Performance issue finding LCS with Haskell

Question

This is a classic programming problem https://en.wikipedia.org/wiki/Longest_common_subsequence_problem

The JS implementation passes all the tests but the Haskell one consumes too much memory and gets killed.

What am I doing wrong?

-- TOP TO BOTTOM
commonChild s1 s2 = L.foldl g l1 l ! n
    where
    n  = length s1
    l1 = arr $ replicate (n + 1) 0
    l  = [ [(x,i,y,j) | (y,j) <- zip s2 [1..]] 
           | (x,i) <- zip s1 [1..]]
    g a = L.foldl (\a' (x,i,y,j) -> let x' = if x == y
                                             then 1 + a ! (j - 1) 
                                             else max (a ! j) (a' ! (j - 1)) 
                                    in a' // [(j,x')]) 
                  l1

arr l = array (0,length l-1) $ zip [0..] l

function lcs(a,b) {
  let n = a.length
  let a1 = []
  for (let i = 0; i <= n; i++) {
    a1.push(0)
  }
  for (let i = 0; i < b.length; i++) {
    let a2 = [0]
    for (let j = 0; j < n; j++) {
      let x = b[i] == a[j] ? 1 + a1[j] : Math.max(a1[j+1],a2[j])
      a2.push(x)
    }
    a1 = a2
  }
  return a1[n]
}

console.log(lcs("SHINCHAN","NOHARAAA"))

https://repl.it/@leonbobster/LCS#main.hs

https://www.hackerrank.com/challenges/common-child/problem

"consumes too much memory" => probably too lazy. My first guess would be "replace `foldl` with `foldl'`"... — MathematicalOrchid, Mar 18 '21 at 10:06
You should provide a full example which one can cut&paste, compile, and observe the slow performance (say at least a few seconds of computation). I spent a few minutes making this compile, and I added your test `lcs("SHINCHAN","NOHARAAA")`, but that ran in a few milliseconds so I couldn't see what's the problem, so I stopped. — chi, Mar 18 '21 at 10:41
@chi I appreciate you time ) here you can find the sample https://gitlab.com/-/snippets/2092238/raw/master/test.txt — Leonid Bobylev, Mar 18 '21 at 11:29
also I create a repl https://repl.it/@leonbobster/LCS#main.hs — Leonid Bobylev, Mar 18 '21 at 11:45

DDub · Answer 1 · 2021-03-18T19:29:10.790

5

Your use of // from Data.Array is really killing your performance. If you read the docs, it says it "Constructs an array identical to the first argument except that it has been updated by the associations in the right argument", which means that every time you call this, you're constructing a brand new array. This is very different from your js implementation, which simply appends.

You may think that arrays are the obvious choice for getting a performance boost, but this is one of those times where regular old lists will do just fine. Rather than generate a new array on every iteration of your fold, each with one new element over the previous, you can just cons onto a list. Consider the following definition of your sub-function g:

g a = arr . reverse . L.foldl (inner a) [0]
inner a a'@(z:_) (x,i,y,j) =
  let x' = if x == y
            then 1 + a ! (j - 1)
            else max (a ! j) z
  in x':a'

Note: The changes I made above were all about choosing a better data structure, but see @chi's answer for more ways to improve performance having to do with negotiating laziness/strictness and doing GHC-specific things.

edited Mar 18 '21 at 19:29

answered Mar 18 '21 at 15:31

DDub

3,884
1
5
12

Brilliant! It is much faster, but still slower than js and fails some performance tests. But it doesn't get killed and returns the correct result. Thank you. – Leonid Bobylev Mar 18 '21 at 16:09
https://www.hackerrank.com/challenges/common-child/problem – Leonid Bobylev Mar 18 '21 at 16:10
The most obvious initial improvement would be to quit using `reverse`, one way or another. That's often a perf killer. Next I'd probably unbox those arrays. But I'm also rather curious whether the algorithm could be improved. – dfeuer Mar 18 '21 at 21:46
@dfeuer yep I tried to implement the the function with "from bottom to top" approach to avoid reversing, no luck so far https://gitlab.com/-/snippets/2092238#LC20 – Leonid Bobylev Mar 19 '21 at 12:20
I added a link on wikipedia article about the LCS – Leonid Bobylev Mar 19 '21 at 13:25

chi · Answer 2 · 2021-03-18T18:57:36.433

I slightly modified your code by

adding type signatures
using foldl'
using bang patterns to force strictness
compiling with -O2 (avoid GHCi)

Here's the modified code (with the long test strings removed):

{-# LANGUAGE BangPatterns #-}
{-# OPTIONS -Wall -O2 #-}
module Main where


import qualified Data.List as L 
import Data.Array

commonChild :: Eq a => [a] -> [a] -> Int
commonChild s1 s2 = L.foldl' g l1 l ! n
    where
        n = length s1
        l1 = arr $ replicate (n + 1) 0
        l = [[(x,i,y,j) | (y,j) <- zip s2 [1..]] | (x,i) <- zip s1 [(1::Int)..]]
        g a = L.foldl' (\ !a' (!x,!_i,!y,!j) -> let 
           ! x' = if x == y
               then 1 + a ! (j - 1) 
               else max (a ! j) (a' ! (j - 1)) in a' // [(j,x')]) l1 
 
arr :: [e] -> Array Int e
arr l = array (0,length l-1) $ zip [0..] l

s1test :: String
s1test = "UBBJXJGKLXGXTFBJ..." -- omitted

s2test :: String
s2test = "WZFPTGLCXK..." -- omitted

main :: IO ()
main = do
    print $ commonChild "SHINCHAN" "NOHARAAA"
    print $ commonChild s1test s2test

The above code ran using under 6MB of RAM, and completed in 3m10s printing 3 and 1417 as output.

By comparison, the original code was using 12GB+ of RAM when I terminated it.

There should be more room for improvement. The arrays in Data.Array can be slow, since each array update has to re-create a new array. When the imperative algorithm can not easily be translated into a nice functional one, perhaps it's better to embrace the imperative side for a moment and start using STUArray and its associated functions, writing some code which precisely mimics the imperative one you posted. Using runST, you can still achieve a pure functional interface and expose the a similar type

commonChild ::
   ( Eq a
   , forall s. MArray (STUArray s) a (ST s)    -- requires some extension
   ) => [a] -> [a] -> Int

(or simply give up polymorphism and use String -> String -> Int).

You can at least *delay* giving up polymorphism by writing a polymorphic worker and letting a monomorphic wrapper use `runST` with it. — dfeuer, Mar 18 '21 at 22:03
@chi Thanks a lot! Your improvements helped and now the function passes all the tests! — Leonid Bobylev, Mar 19 '21 at 14:47
It turns out to be quite easy to skip arrays altogether and just use lists. — dfeuer, Mar 19 '21 at 23:34

dfeuer · Answer 3 · 2021-03-19T18:13:58.510

Reading the Wikipedia description of the algorithm led me quite directly to an implementation using only lists; no arrays:

{-# LANGUAGE BangPatterns #-}

-- Calculate the next row from the character along the
-- left edge, the string along the top edge, and the previous
-- row.
makeRow :: Char -> String -> [Int] -> [Int]
makeRow match = go 0 0
  where
    -- The first arguments are the values in the arguments
    -- to the upper left and immediate left of the current
    -- cell.
    go :: Int -> Int -> Char -> String -> [Int] -> [Int]
    go !up_left !left (c:cs) (l:ls) =
      cur : go l cur cs ls
      where
        !cur
          | c == match = 1 + up_left
          | otherwise = max l left
    go _ _ _ _ = []

commonChild s1 = go (repeat (0 :: Int))
  where
    go ls [] = last ls
    go ls (c:cs) = go (makeRow c s1 ls) cs

This is fast enough to pass all the tests, and it's a lot simpler than mucking with arrays. Constant factors could be improved in various ways, but this is a good place to start. The first way I'd try to improve this is to replace [Int] everywhere with a type that looks like

data IntList = Cons !Int IntList | Nil

This saves two words of memory and one pointer indirection per element. Switching to unboxed arrays (at least for the Int lists) should give further improvements in many cases, but it'll be considerably more annoying.

Performance issue finding LCS with Haskell

3 Answers3