0

I have a one-dimensional Repa array that consists of 0's and 1's and I want to calculate its run-length encoding. E.g.: Turn [0,0,1,1,1,0,0,0,1,0,1,1] into [2,3,3,1,1,2] or something similar. (I'm using a list representation because of readability)

Ideally, I would like the run-length of the 1's and ignore the 0's. So [0,0,1,1,1,0,0,0,1,0,1,1] becomes [3,1,2].

I would like the result to be a (Repa) array as well.

How can I do this using Repa? I can't use map or traverse since they only give me one element at a time. I could try to fold with some special kind of accumulator but that doesn't seem to be ideal and I don't know it it's even possible (due to monad laws).

Valerie94
  • 303
  • 2
  • 9
  • Is your array onedimensional? If not: Do you want the encoding for each row or for the onedimensional representation of the n-dim thing? – sdx23 Jan 22 '17 at 16:48
  • @sdx23 My array is one-dimensional. – Valerie94 Jan 22 '17 at 17:24
  • Repa is not made for this sort of thing. You would be better using just about anything else... Why do you need repa? – Alec Jan 22 '17 at 18:01
  • @Alec Because processing a large list of data is too slow. I've already implemented a large part of my program using Repa and it seems to be about 3 times faster than my list implementation. Why isn't Repa made for this sort of thing? Every tutorial you can find demonstrates Repa's power by an example in image processing. So if I have an image (as an array), can't I compress it (via run-length encoding) using Repa? – Valerie94 Jan 22 '17 at 18:53
  • @Valerie94 Repa is good at parallel maps/folds, while this is a necessarily sequential fold. That said, your setup makes sense. – Alec Jan 22 '17 at 18:56
  • Use [toFunction](https://hackage.haskell.org/package/repa-3.4.1.2/docs/Data-Array-Repa.html#t:D) (or e.g. `delay`) and work with the function - most repa functions won't help you here, as your operation isn't (at least as stated) parallelizable. – user2407038 Jan 23 '17 at 17:46

1 Answers1

0

I'm currently just iterating over the array and returning a list without using any Repa function. I'm working on Booleans instead of 1's and 0's but the algorithm is the same. I'm converting this list to a Repa Array afterwards.

runLength :: Array U DIM1 Bool -> [Length]
runLength arr = go ([], 0, False) 0 arr
  where
    Z :. n = extent arr
    go :: Accumulator -> Int -> Array U DIM1 Bool -> [Length]
    go !acc@(xs, c, b) !i !arr | i == n = if c > 0 then c:xs else xs
                               | otherwise =
                                 if unsafeIndex arr (Z :. i)
                                 then if b
                                      then go (xs, c+1, b) (i+1) arr
                                      else go (xs, 1, True) (i+1) arr
                                 else if b
                                      then go (c:xs, 0, False) (i+1) arr
                                      else go (xs, 0, False) (i+1) arr
Valerie94
  • 303
  • 2
  • 9
  • I wonder how that compares to the more general `map (length &&& head) . group . toList` – Cirdec Jan 23 '17 at 21:47
  • 1
    @Cirdec The setup is as follows: 1. I read all the samples from a file and convert them to Booleans. 2. I call runLength to calculate the run-length encoding. 3. I print the result to the stdout. Using my function, this takes 0.013s on average. Using your function, this takes 0.249s on average. With your function, I still have to extract the run-lengths for the True values from the result. – Valerie94 Jan 24 '17 at 13:54