3

Working on a compiler, I am facing a problem related to named argument lists and evaluation order. Basically, the language ensures that for the following function and function call:

int call(int first, int second) = first - second

int sideEffect(int i) { println(i); return i }

println(call(second: sideEffect(0), first: sideEffect(1)))

The output will be

0
1
1

although the arguments are given in reverse order. I am working on the JVM, so they must be sorted on the bytecode level to match the signature of call. In stack-based pseudo-bytecode:

push 0
call sideEffect(int)
push 1
call sideEffect(int)
swap
call call(int, int)

As you can see, a swap bytecode instruction is required here to make sure they are both evaluated and passed in the right order. This can be drawn as a graph:

Actual:   second   first
                \ /
                swap
                / \
Formal:    first   second

There is only one swap instruction that can swap the top two elements on the stack, so for more elements the compiler needs to generate local variables.

Actual: third   second   first
          |       |  /¯¯¯¯
        local0   swap    local0
            ____/ |        |
Formal: first   second   third

In bytecode:

third...
store local0
second...
first...
swap
load local0
call ...

Of course this can be expanded to arbitrary amounts of actual and formal parameters.

What I am now looking for is some kind of algorithm that determines if and where I should insert these swap or local variable instructions. A reference to a compiler implementation would also be helpful.


Note that it is not part of my problem which actual argument belongs to which formal parameter. That is already solved by my compiler. To simplify, the problem could be looked at like this:

Given two arrays of the same sizes containing the same elements in different order, which elements can be swapped or shifted from the first array to get the order of the second one.

Clashsoft
  • 11,553
  • 5
  • 40
  • 79
  • If you're working on the JVM it is already too late. It is up to the compiler to emit the bytecode in the correct sequence. Not the interpreter to fix it. – user207421 Jun 17 '16 at 06:05

1 Answers1

3

With registers only, there's basically only one solution. Iterate through the arguments in desired stack order. If the current argument is already computed, load it from a local. Otherwise, compute the uncomputed arguments before it in execution order, storing them in registers, and then compute the current argument.

The presence of a swap operation means that we can use the top of the stack as a de facto local variable. Python code below.

import heapq
import itertools


class Allocator:

  def __init__(self):
    self.free_heap = []
    self.next_free = 0

  def allocate(self):
    if self.free_heap:
      return heapq.heappop(self.free_heap)
    i = self.next_free
    self.next_free += 1
    return i

  def free(self, i):
    heapq.heappush(self.free_heap, i)

  def is_allocated(self, i):
    return i < self.next_free and i not in self.free_heap


def loads_and_stores(perm):
  perm = list(perm)
  n = len(perm)
  assert set(perm) == set(range(n))
  location = {}
  allocator = Allocator()
  i = iter(perm)
  # "location 0" is the top of the stack
  for j in range(n):
    if j in location:
      if location[j] != 0:
        # not top of stack; need an actual load
        print 'load', location[j]
        if allocator.is_allocated(0):
          # need to maintain the top of the stack
          print 'swap'
      allocator.free(location[j])
    else:
      while True:
        k = next(i)
        print 'compute', k
        if k == j:
          if allocator.is_allocated(0):
            # need to maintain the top of the stack
            print 'swap'
          break
        location[k] = allocator.allocate()
        if location[k] != 0:
          # not top of stack; need an actual store
          print 'store', location[k]
  return perm


def test(n):
  for perm in itertools.permutations(range(n)):
    print perm
    loads_and_stores(perm)


test(4)

There are some interesting optimizations to do here, because allocate can return any free local. An optimal algorithm would consider the execution cost of making each allocation.

David Eisenstat
  • 64,237
  • 7
  • 60
  • 120
  • Thanks for the quick answer. I don't think *too* aggressive optimizations are needed for my use case - the JVM already does these. I could probably get away with using local variables only and end up with the same performance. – Clashsoft Jun 15 '16 at 20:38