What is an FFP machine?

Question

In R. Kent Dybvig's paper "Three Implementation Models for Scheme" he speaks of "FFP languages" and "FFP machines". Apparently there is some correlation between FFP machines, and string-reduction on multiple processors.

Googling doesn't really uncover much in terms of explanations or examples.

Can anyone shed some light on this topic? Thanks.

score 3 · Answer 1 · answered Mar 14 '17 at 15:39

3

Kent Dybvig's advisor, Gyula A. Mago, published a detailed description in "The FFP Machine: Technical Report 87-014" in 1987 by Mago and Stanat.

As of this writing, the PDF is freely available at: http://www.cs.unc.edu/techreports/87-014.pdf

answered Mar 14 '17 at 15:39

absurdhero

356
1
11

score 3 · Answer 2 · edited Jun 22 '20 at 06:14

The FFP Machine is a very fine-grained parallel computer architecture: each processor holds a single symbol / atom / value. It uses a string reduction model of computation in which innermost function applications are found and replaced by their equivalent result (eager evaluation). Where a result is used in several places, it tends to be re-evaluated instead of incurring the costs of accessing some global store (but see Mago's paper on "Copying Operands vs Copying Results", or better yet Mago's "Data Sharing in an FFP Machine" in the 1982 Functional Programming Languages and Computer Architecture conference).

The L cells holding the FFP expression being reduced communicate through a tree structured arrangement of T cells. Note that IC's are basically two dimensional and with wiring, circuits can move towards being three dimensional in physical space. Interconnection networks that occupy higher dimensions (such as the Hypercube, Omega, Banyan, Star, etc. networks) will eventually be unable to perform near their theoretical limit.

This communication network is circuit-switched rather than being packet-switched. Data packets contain no addresses and do not need routing. Packets from distinct reductions cannot meet, cannot conflict and cannot experience congestion with each other. The configuring activity (called "Partitioning") is performed in a single sweep upwards in the tree, using a handful of logic operations on 3-bit messages, leaving "area machines" in its wake, each created to advance at most a single reducible application. While it is technically logarithmic in time, the resulting area machines can begin communicating in a pipelined fashion behind the partitioning wave, practically costing a constant time penalty. (The dismantling of area machines remains a logarithmic cost in time).

Packets within a single reduction should, and must, meet and thus provide a often-useful synchronization. Sequences of packets are sorted and combined as they rise within an area, to be broadcast from the root of the area machine. Parallel Prefix and Parallel Suffix operations are provided to reduce area traffic, since there remains a potential bottleneck within an individual reducible application. This is accomplished without the need exhibited in the Ultracomputer (Jack (Jacob?) Schwartz at NYU) for a separate logarithmic-sized cache memory in each communication node. Each T cell (internal tree node) only needs a FIFO buffer (for efficiency) of size greater than the pipeline path to the top of the tree and back down. (This latter is a conjecture of mine, but it seems reasonable). Since the tree maintains the left-to-right order of data (unlike some other combining networks), the system enables cells to rotate their data in logarithmic rather than linear time, avoiding the plausible congestion at the root of the area machine. It's worth noting again, that the parallelism within an area machine is independent of the simultaneous parallelism in other area machines, and has available to it a number of processors proportional to the quantity of data in the operand.

a. thank you very much, b. is this "unable" or "able"? c. would "concurrency" better describe this, rather than "parallelism"? — Will Ness, May 22 '18 at 07:26
Theoretical networks that provide good communication will eventually cost at least cube root of n time, to position n processors in 3-space. Danny Hillis's Connection Machine occupied two cabinets, each having a 6 inch nest of wiring in the back. To double the number of processors there needs a 6" sheet of wires connecting the two pairs of cabinets. To me (for teaching OS), "concurrency" includes the interleaving of actions by a single agent (yielding no speed-up), whereas "parallelism" explicitly claims several actions occurring simultaneously. — David Middleton, May 22 '18 at 17:29

score 2 · Accepted Answer · answered Jan 27 '10 at 15:19

2

Have you come across this yet?: Compiling APL for parallel execution on an FFP machine

answered Jan 27 '10 at 15:19

Daniel Pratt

12,007
2
44
61

Um. Yes, but my ACM membership has expired and I wasn't able to download it. What magic did you use to create that public link? Anyway, thanks. This looks good. – Mark Bolusmjak Jan 27 '10 at 15:35
If you...uh...Bing "FFP machines", the fourth link down is from Microsoft Academic Research. I took the link from there. – Daniel Pratt Jan 27 '10 at 15:47
I should start Bing-ing more. – Mark Bolusmjak Jan 27 '10 at 16:03

score 1 · Answer 4 · answered Jan 27 '10 at 15:15

1

Formal FP. Similar to FP, but with regular sugarless syntax, for machine execution is all I can offer you.

See Wikis Fp page.

answered Jan 27 '10 at 15:15

gingerbreadboy

7,386
5
36
62

What is an FFP machine?

4 Answers4