2

If I wanted to right-pad a Haskell list of integers, I could do something like the following:

rpad m xs = take m $ xs ++ repeat 0

There would be no need to get the length of the list. I think it would be quite performant.

Is there a similar way I could definte lpad, padding the list on the left, without incurring the cost of counting the length, etc. ?

  • 1
    You could reverse the list, right pad it, then reverse again. – bheklilr Mar 19 '15 at 19:16
  • *There would be no need to get the length of the list.* But that is what `take` does (under the hood) in your example. One way or another, you need to measure the length of the list to figure out how much padding is needed. – jub0bs Mar 19 '15 at 19:20
  • 1
    I figure that `take` would allow you to traverse the list as you consume it. However, if you use the function `length`, you'd be doing it at least twice. –  Mar 19 '15 at 19:22
  • @Ana Why twice? `replicate (m - length xs) 0 ++ xs` would do the job in roughly `m` operations. Same as in your example. – jub0bs Mar 19 '15 at 19:24
  • Because you would traverse the list twice. Once for `length`, and once when you consume the list. The `rpad` function would traverse the list once, when you consume it. Isn't that right? –  Mar 19 '15 at 19:35
  • `xs ++ ys` only traverses `xs` (the *first* list), not `ys` (the second list). Do we agree on that? – jub0bs Mar 19 '15 at 19:36
  • @Jubobs nooooo. That will **not** do the job in roughly `m` operations if `xs = [0..]`. It won't actually do the job *at all*. – CR Drost Mar 19 '15 at 19:46
  • @ChrisDrost I was assuming `xs` to be finite, as did bheklilr in his comment; I think the question implies it: *There would be no need to get the length of the list*. – jub0bs Mar 19 '15 at 19:56
  • [Joke] @Ana saw the leftpad apocalypse coming three days before it happened! – sid-kap Apr 18 '16 at 04:34

4 Answers4

6

So the first thing worth saying is, don't worry too much about the nitty-gritty of performance. This code is unlikely to be sitting in the proverbial 20% of your code which takes up 80% of the running time.

With that said, where does performance really matter, here? It matters if m is small while length xs is huge or infinite. I mean, it would also be nice to get good performance if m is large (as you have with rpad, where if you process only the first k items of the list you only do k work for some k << m), but of course the very problem description requires that you potentially do m work to see the top result. (If you're provided an infinite list you need to peek at m items to even know whether to return 0 for the first element.)

In that case, you really want to zero-pad take m xs instead of xs. That's the entire trick:

lpad m xs = replicate (m - length ys) 0 ++ ys
    where ys = take m xs
CR Drost
  • 9,637
  • 1
  • 25
  • 36
  • All depends on your applications what routines get called frequently, but probably you shouldn't be using lists if this gets called that much, well maybe that's what you meant. Anyways yes `take m xs` is clearly the best option for lists. – Jeff Burdges Mar 19 '15 at 19:50
5

Right padding only needs to examine the first constructor in the input list (: or []) in order to produce the first element of the output list. It's a streaming operation (could be done with foldr).

Left padding needs to examine the whole input list in order to produce the first element of the output list. That is, whether the first element is 0 or not depends on the tail of the list (assuming it does not start with 0). This can not be done in a streaming way. O(min(m,length)) is the best you can get, for the first element only.

Also, be careful since your padding function discards elements after the m-th, if your input list is longer than that. This might be unwanted -- sometimes padding is defined so that it can only add elements, and never remove.

chi
  • 111,837
  • 3
  • 133
  • 218
0

Here is an (untested but compiling) set of padding/trimming functions (Unlicenced):

padL :: a -> Int -> [a] -> [a]
padL p s l
    | length l >= s = l
    | otherwise     = replicate (s - length l) p ++ l
{-# INLINABLE padL #-}

padR :: a -> Int -> [a] -> [a]
padR p s l = take s $ l ++ repeat p
{-# INLINABLE padR #-}

trimL :: Int -> [a] -> [a]
trimL s l
    | length l <= s = l
    | otherwise     = drop (length l - s) l
{-# INLINABLE trimL #-}

trimR :: Int -> [a] -> [a]
trimR = take
{-# INLINE trimR #-}

resizeL :: a -> Int -> [a] -> [a]
resizeL p s l
    | length l == s = l
    | length l < s  = padL p s l
    | otherwise     = trimL s l
{-# INLINABLE resizeL #-}

resizeR :: a -> Int -> [a] -> [a]
resizeR p s l
    | length l == s = l
    | length l < s  = padR p s l
    | otherwise     = trimR s l
{-# INLINABLE resizeR #-}
Solomon Ucko
  • 5,724
  • 3
  • 24
  • 45
0

@Solomon Ucko

Why call length twice in padL (ditto for others)? How about

padL :: a -> Int -> [a] -> [a]
padL p s l
    | length' >= s = l
    | otherwise    = replicate (s - length') p ++ l
      where length' = length l
Lo HaBuyshan
  • 359
  • 2
  • 12
  • Especially in a functional language such as Haskell, it seems likely that the compiler will just inline `length'`, making the two versions equivalent. If I test it out, it looks like, at `-O0`, your version is slightly shorter, which *often* implies faster, but at `-O1` or higher (https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/using-optimisation.html), they appear to be equivalent: https://godbolt.org/z/aWb4oq – Solomon Ucko Jan 29 '21 at 17:03