Writing fusible O(1) update for vector

Question

It is continuation of this question. Since vector library doesn't seem to have a fusible O(1) update function, I am wondering if it is possible to write a fusible O(1) update function that doesn't involve unsafeFreeze and unsafeThaw. It would use vector stream representation, I guess - I am not familiar with how to write one using stream and unstream - hence, this question. The reason is this will give us the ability to write a cache-friendly update function on vector where only a narrow region of vector is being modified, and so, we don't want to walk through entire vector just to process that narrow region (and this operation can happen billions of times in each function call - so, the motivation to keep the overhead really low). The transformation functions like map process entire vector - so they will be too slow.

I have a toy example of what I want to do, but the upd function below uses unsafeThaw and unsafeFreeze - it doesn't seem to be optimized away in the core, and also breaks the promise of not using the buffer further:

module Main where
import Data.Vector.Unboxed as U
import Data.Vector.Unboxed.Mutable as MU
import Control.Monad.ST

upd :: Vector Int -> Int -> Int -> Vector Int
upd v i x = runST $ do
          v' <- U.unsafeThaw v
          MU.write v' i x
          U.unsafeFreeze v'

sum :: Vector Int -> Int
sum = U.sum . (\x -> upd x 0 73) . (\x -> upd x 1 61)

main = print $ Main.sum $ U.fromList [1..3]

I know how to implement imperative algorithms using STVector. In case you are wondering why this alternative approach, I want to try out this approach of using pure vectors to check how GHC transformation of a particular algorithm differs when written using fusible pure vector streams (with monadic operations under the hood of course).

When the algorithm is written using STVector, it doesn't seem to be as nicely iterative as I would like it to be (I guess it is harder for GHC optimizer to spot loops when there is lot of mutability strewn around). So, I am investigating this alternative approach to see I can get a nicer loop in there.

`unsafeFreeze`/Thaw is obviously intended to be called to update vector anyway, otherwise vector's purity is broken. — leventov, Jun 08 '13 at 16:49

score 4 · Answer 1 · answered Jun 08 '13 at 21:36

4

The upd function you have written does not look correct, let alone fusable. Fusion is a library level optimization and requires you to write your code out of certain primatives. In this case what you want is not just fusion, but recycling which can be easily achieved via the bulk update operations such as // and update. These operations will fuse, and even happen in place much of the time.

If you really want to write your own destructive update based code DO NOT use unsafeThaw--use modify

answered Jun 08 '13 at 21:36

Philip JF

28,199
5
70
77

1

http://stackoverflow.com/questions/16995850/no-stream-fusion-with-unsafeupdate-in-unboxed-vector - about `update` – leventov Jun 08 '13 at 22:09
+1 for link to recycling paper...I didn't know what the term was for this kind of update. Now, I know. – Sal Jun 08 '13 at 22:48

applicative · Answer 2 · 2013-06-08T17:53:40.960

3

Any function is a fusible update function; you seem to be trying to escape from the programming model the vector package is trying to get you to use

module Main where
import Data.Vector.Unboxed as U

change :: Int -> Int -> Int
change 0 n = 73
change 1 n = 61
change m n = n

myfun2 = U.sum . U.imap change .  U.enumFromStepN 1 1 
main = print $ myfun2 30000000

-- this doesn't create any vectors much less 'update' them, as you will see if you study the core.

edited Jun 08 '13 at 17:53

answered Jun 08 '13 at 16:41

applicative

8,081
35
38

1

`imap` processes every vector element which makes it O(n) for an update that affects say only one vector element. Or is it really not the case? Like `ghc` spotting only 1..k (out of 1..n) elements are affected, and so, convert it to a loop that processes only those k elements? – Sal Jun 08 '13 at 17:05
There is something wrong with the way you are thinking of it, no? Indeed `imap` will in the worst case study an actual vector, and go through it element by element and write a new vector. If it fuses to the left or right -- above it does both directions -- then this will not be a correct description. – applicative Jun 08 '13 at 17:48
Thus, fused with U.sum, there is just a fold over the original vector serially adding its (`change`d) elements; it looks at each element, but only once, not once for `change`, and once for `U.sum`. – applicative Jun 08 '13 at 17:58
If it fuses with something like `U.enumFromStepN` at most one vector will be written, the plain `U.enumFromStepN 1 1 10000` will never exist, then altered or copied by looking at each element and `change`ing it. So the question doesn't arise, 'how many of its elements we actually process'. The only elements that ever exist are the `change` -d ones, so the apparent cost of 'going through them one by one' is an illusion. Finally, as in the present case, there is fusion to the left and right, and examination of core shows that no vector is written at all. – applicative Jun 08 '13 at 18:02
I get what you are talking about. Yes, I am familiar with that way of processing vectors. But, the diff/longest common subsequence algorithm I am writing doesn't seem to translate to that model. Given two sequences of say length 4k, in one particular case, it will iterate through vector update function 16M times, one location in each iteration, depending on the output of previous iteration (i.e., vector from previous iteration). Perhaps it can be done the way you suggested without getting slow. I will do more experiments. – Sal Jun 08 '13 at 18:40

Writing fusible O(1) update for vector

2 Answers2