1

I'm implementing something similar to a Spreadsheet engine in Haskell.

There are ETables, which have rows of cells containing expressions in the form of ASTs (e.g. BinOp + 2 2), which can contain references to other cells of ETables.

The main function should convert these ETables into VTables, which contain a fully resolved value in the cell (e.g. the cell BinOp + 2 2 should be resolved to IntValue 4). This is pretty easy when cells have no external references, because you can just build the value bottom up from the expression AST of the cell (e.g. eval (BinOpExpr op l r) = IntValue $ (eval l) op (eval r), minus unboxing and typechecking) all the way to the table (evalTable = (map . map) eval rows)

However, I can't think of a "natural" way of handling this when external references are thrown into the mix. Am I correct to assume that I can't just call eval on the referenced cell and use its value, because Haskell is not smart enough to cache the result and re-use it when that cell is independently evaluated?

The best thing I came up with is using a State [VTable] which is progressively filled, so the caching is explicit (each eval call updates the state with the return value before returning). This should work, however it feels "procedural". Is there a more idiomatic approach available that I'm missing?

BruceBerry
  • 1,166
  • 1
  • 9
  • 21
  • What do you mean by "external references"? – bheklilr Feb 18 '15 at 18:54
  • references outside the cell (and outside of your bottom-up flow of data). Imagine the excel equivalent being =A1+2, where A1 is another cell with a number and you need to `eval` that cell first if you want to evaluate this one. – BruceBerry Feb 18 '15 at 19:04
  • 3
    It gets interesting when cells recursively reference one another ;) – Benjamin Hodgson Feb 18 '15 at 23:00
  • Yeah I already detect loops, but I don't think I will handle fix points. I mark `VTable` cells as "Evaluating" so that I can detect loops dynamically (and I have a statically constructed dependency graph that I currently use to unvalidate `VTable` cells when the table is changed, but that I could also use for this purpose) – BruceBerry Feb 18 '15 at 23:11

1 Answers1

4

Haskell doesn't memoization by default because that would generally take up too much memory, so you can't just rely on eval doing the right thing. However, the nature of lazy evaluation means that data structures are, in a sense, memoized: each thunk in a large lazy structure is only evaluated once. This means that you can memoize a function by defining a large lazy data structure internally and replacing recursive calls with accesses into the structure—each part of the structure will be evaluated at most once.

I think the most elegant approach to model your spreadsheet would be a large, lazy directed graph with the cells as nodes and references as edges. Then you would need to define the VTable graph in a recursive way such that all recursion goes through the graph itself, which will memoize the result in the way I described above.

There are a couple of handy ways to model a graph. One option would be to use an explicit map with integers as node identifiers—IntMap or even an array of some sort could work. Another option is to use an existing graph library; this will save you some work and ensure you have a nice graph abstraction, but will take some effort up front to understand. I'm a big fan of the fgl, the "functional graph library", but it does take a bit of up-front reading and thinking to understand. The performance isn't going to be very different because it's also implemented in terms of IntMap.

Tooting my own horn a bit, I've written a couple of blog posts expanding on this answer: one about memoization with lazy structures (with pictures!) and one about the functional graph library. Putting the two ideas together should get you what you want, I believe.

Tikhon Jelvis
  • 67,485
  • 18
  • 177
  • 214
  • I've seen the fibonacci example where I define the data structure for memoization in terms of calls to a recursive function (but outside the recursive function, so it is lazily built by the recursive function), is that the strategy you are suggesting? I'm trying to understand the benefit of the approach: is it true that the only advantage I get is that I get implicit, correct memoization instead of having to do it myself? – BruceBerry Feb 19 '15 at 15:39
  • @BruceBerry: Yep, that's the strategy. And yeah, the main advantage is that you don't have to do memoization yourself. Expressing your whole computation as a lazy data structure also lets you extend the logic by just traversing the structure again, which can make your code a bit neater. – Tikhon Jelvis Feb 19 '15 at 18:09