1

I've been exploring writing a very basic / limited interpreter in javascript as an exercise. All has been going well until I introduced the concept of LOOPs.

Given the following script:

LOOP 2
  A
  LOOP 3
    B
  END
  LOOP 4
    C
    LOOP 5
      D
    END
    E
  END
  F
END

The algorithm should visit the inner tokens in the following sequence:

ABBBCDDDDDECDDDDDECDDDDDECDDDDDEFABBBCDDDDDECDDDDDECDDDDDECDDDDDEF

The following does the trick, but it requires lots of iterating over the tokens. It's an improvement over a previous slicing approach I used that manually expanded the loops, but is far from optimal.

/**
 * In practice, we'll grab each token as we read the script,
 * but to keep this simple and focus on the loop algorithm,
 * we can cheat and make an array of all the tokens.
 */

const getTokens = (s) => s.replace(/[\W_]+/g, " ").split(" ").filter(Boolean);

/* Temp vars - ideally, I'd like to solve this with arrays. */
const start = []; // Loop start indices
const end = []; // Loop end indices
const counts = []; // Times to loop
const completed = []; // Loops completed

for (let i = 0; i < tokens.length; i++) {
  const token = tokens[i];

  if (token === "LOOP") {
    if (start.length == 0 || i > start[start.length - 1]) {
      // Add new loop index if we haven't seen it before
      start.push(i); // Store the loop index
      counts.push(Number(tokens[i + 1])); // The loop count is always next LOOP token
      completed.push(0); // Initialize with 0 completed at index

      // Find the end index for the loop
      // Note: This is the slowest part.
      let skip = 0;
      for (let j = i + 2; j < tokens.length; j++) {
        if (tokens[j] == "LOOP") {
          skip++; // Increase nest depth
        } else if (tokens[j] == "END") {
          if (skip == 0) {
            end.push(j); // Found matching loop close
            break;
          }
          skip--;
        }
      }
    }

    i++; // Skip over the loop count
    continue;
  } else if (token === "END") {
    let j;
    for (j = 0; j < end.length; j++) {
      if (end[j] == i) break; // Found matching end index
    }
    const isCompleted = completed[j] == counts[j] - 1;
    if (!isCompleted) {
      i = start[j] + 1;
      completed[j]++;
      for (let k = j + 1; k < start.length; k++) {
        completed[k] = 0; // Reset nested loops in between
      }
    }
    continue;
  }

  console.log(tokens[i]);
}

https://jsfiddle.net/5wpa8t4n/

What's a better way to accomplish this array-based approach using a single pass through the script, or at worst 2 passes, but not N-LOOP passes?

noobcode
  • 13
  • 4
  • "*The parser should return*" - no it shouldn't. Separate the parser (which creates an AST or byte code) from the interpreter (which creates the output). – Bergi Sep 18 '21 at 00:59
  • I understand that creating an abstract syntax tree with byte code might be helpful, but it wouldn't necessarily improve the time complexity of my loop parsing algorithm, which is my primary objective. I've scoped this problem on here to highlight the exact thing I want to solve in an easy to digest snippet. – noobcode Sep 18 '21 at 01:05
  • Then I don't understand your objective. Only a parser that does not execute the code can be efficient and use a fixed number of passes. An interpreter that actually runs the loop to produce the output you want always has to repeatedly pass through the loop, its time complexity cannot be better than linear to the size of the output. – Bergi Sep 18 '21 at 01:16
  • Right now, my approach is to find each "Loop", and for each one, find its respective "end" by looking through all tokens. This is preliminary setup work before I actually loop through all tokens with the ability to jump. I've been trying to come up with a way to just read the tokens once and perform the jumps, but the algorithm eludes me. The objective is really to do what the above script does, but with fewer scans through all tokens. – noobcode Sep 18 '21 at 01:19
  • @noobcode - your objective is clear - Bergi just wants you to recognize and use the correct terminology. – Randy Casburn Sep 18 '21 at 01:34
  • My mistake. Updated to improve clarity. – noobcode Sep 18 '21 at 01:44

1 Answers1

0

You don't need to know the position of the matching end of the loop when starting to interpret it. All you need to record is the position to jump back to when encountering the next end, but until then just continue interpreting token by token.

These positions, together with the respective counters, can be stored in a stack structure.

const script = `
DO
  A
  DO
    B
  LOOP 3
  DO
    C
    DO
      D
    LOOP 5
    E
  LOOP 4
  F
LOOP 2
`

const parse = (script) =>
  script
    .replace(/[\W_]+/g, " ")
    .split(" ")
    .filter(Boolean);

const interpret = (code) => {
  let loops = []; // Active loops: iteration count and jump target
  let ip = 0; // instruction pointer

  let result = "";
  while (ip < code.length) {
    const instruction = code[ip];
    switch (instruction) {
      case "DO": {
        ++ip;
        loops.push({count: 0, start: ip});
      } break;
      case "LOOP": {
        const limit = Number(code[++ip]);
        const {count, start} = loops.pop();
        if (count < limit) {
          loops.push({count: count+1, start});
          ip = start; // jump back
        } else {
          ++ip;
        }
      } break;
      default: {
        ++ip;
        result += instruction; // a print statement
      } break;
    }
  }
  return result;
};

console.log(interpret(parse(script)));

I've simplified the structure a bit to use do-while loops, so I'd never have to skip the loop body. In a true byte code, emitted by a parser, the jump targets (both back and forth) would be part of the instructions themselves, and only the count "variables" would need to be stored in the stack. The jump targets never change so you'd need to generate them only once in the parse function.

Bergi
  • 630,263
  • 148
  • 957
  • 1,375
  • Oh, this is great! I went down the rabbithole of using pop() on a stack structure, but it wasn't quite right. I now see what you mean by the switch-based interpreter. – noobcode Sep 18 '21 at 02:33