0

I'm looking for a way to run awk in a verifiably deterministic way, that is to say: result should be determined by input only. In other words, given that a program has output, I want to know that it is repeatable.

This would mean removing access to non-deterministic sources of input, such as the system time or files with changing content such as /dev/random.

I have looked at the sandbox flag in gawk, which I don't think will help, and ZeroVM.

Martin Geisler
  • 72,968
  • 25
  • 171
  • 229
ateles
  • 422
  • 3
  • 10
  • What side-effects does awk generate in it's normal operation? – zetavolt Apr 12 '16 at 18:28
  • I'm not sure if I understand. There are many ways to write an awk program so that it depends on the system clock, state etc. This destroys determinism. – ateles Apr 12 '16 at 18:36
  • What on earth are you talking about? Please edit your question to include a [mcve]. – Ed Morton Apr 12 '16 at 19:02
  • 1
    I'm not looking for a piece of code but for a way to sandbox awk. I want code in a sandbox to either act as a [pure function](https://en.wikipedia.org/wiki/Pure_function) or to fail. This makes the result cacheable, and the same forever. – ateles Apr 12 '16 at 19:19
  • Again, and for the last time, please edit your question to include a [mcve]. – Ed Morton Apr 12 '16 at 20:38

2 Answers2

1

I don't think it is possible in general. For example, this script will print different values when run even though it doesn't depend on any input file

 awk 'BEGIN{print systime()}'

However, you can write your scripts in a functional, repeatable way to only depend on the input file and have a predefined output order (the array order iteration is not predictable), don't make system calls or use random.

karakfa
  • 66,216
  • 7
  • 41
  • 56
  • I had an idea to remove all system functions with some kind of sanitizer script. I'm unsure if that would actually help. I need something that works even if the person writing the code is malicious. – ateles Apr 12 '16 at 20:00
  • In that case use `Haskell` instead of `awk`. – karakfa Apr 12 '16 at 20:05
  • From what I understand, unlike Awk, Haskell is not an easy language and few people uses it. I want to create a service where people can input their own code logic, as opposed to just data. Sort of like the Go playground, but for a different purpose. – ateles Apr 12 '16 at 22:02
1

ZeroVM would indeed be a way to do what you want: it sandboxes applications and removes all non-deterministic system calls. As an example, there are no threds (since their scheduling ineviatable leads to non-determinism) and the time starts at Jan 1 1970 on every execution (the time is then advanced by certain system calls).

I don't have a system with ZeroVM installed any longer, but it shouldn't be difficult to compile awk for it. Infact, I remember that busybox was running in ZeroVM, and busybox has some form of awk.

Martin Geisler
  • 72,968
  • 25
  • 171
  • 229
  • Good to know. I've only looked at the docs so far. Will take a deep look at it tomorrow. Hopefully I can spin up a Docker image. The setup looked somewhat complicated, but it shouldn't be impossible. – ateles Apr 12 '16 at 22:23