2

I'm trying to read a massive sexp from file into memory, and it seems to be working out fine for smaller inputs, but on more deeply nested ones sbcl conks out with stack exhaustion. There seems to be a hard recursion limit (at 1000 functions deep) that sbcl simply cannot surpass (strangely, even when its stack size is increased). Example (code is here): make check-c works, but make check-cpp exhausts the stack as below:

INFO: Control stack guard page unprotected
Control stack guard page temporarily disabled: proceed with caution
Unhandled SB-KERNEL::CONTROL-STACK-EXHAUSTED in thread #<SB-THREAD:THREAD
                                                         "main thread" RUNNING
                                                          {10034E6DE3}>:
  Control stack exhausted (no more space for function call frames).
This is probably due to heavily nested or infinitely recursive function
calls, or a tail call that SBCL cannot or has not optimized away.

PROCEED WITH CAUTION.

Backtrace for: #<SB-THREAD:THREAD "main thread" RUNNING {10034E6DE3}>
0: ((LAMBDA NIL :IN SB-DEBUG::FUNCALL-WITH-DEBUG-IO-SYNTAX))
1: (SB-IMPL::CALL-WITH-SANE-IO-SYNTAX #<CLOSURE (LAMBDA NIL :IN SB-DEBUG::FUNCALL-WITH-DEBUG-IO-SYNTAX) {100FC9006B}>)
2: (SB-IMPL::%WITH-STANDARD-IO-SYNTAX #<CLOSURE (LAMBDA NIL :IN SB-DEBUG::FUNCALL-WITH-DEBUG-IO-SYNTAX) {100FC9003B}>)
...

Why am I using recursion, then? Actually, I'm not, but unfortunately the builtin (read) uses recursion, and that's where the stack overflow is occurring. The other option (which I've started working on) is to write an iterative version of read which relies upon the more limited syntax that I'm feeding into it from a separate program to avoid the complexity of re-implementing read (my (currently broken) attempts at that are in the lisp branch of the above repository).

However, I'd prefer a more canonical solution. Are there alternatives to the builtin read that can parse deeply nested structures by avoiding recursion?

EDIT: This appears to be an insurmountable issue with sbcl itself, not the input data. For a quick example, try running:

(for i in $(seq 1 2000); do
echo -n "("
done; echo -n "2"; for i in $(seq 1 2000); do
echo -n ")"
done; echo) > file

And then in sbcl:

(with-open-file (file "file" :direction :input) (read file))

The same failure occurs.

EDIT: Asked around on #sbcl, and apparently the control stack size really applies only to new threads, and that the stack size for the main thread is affected by a lot of other factors as well. So I tried putting the read in a separate thread. Still didn't work. Checkout this repo and run make check if you're interested.

cosmicexplorer
  • 523
  • 6
  • 15
  • Recursive functions are the natural way to parse recursive data formats, such as S-expressions. And because of the way that Lisp read macros work, it's practically required. – Barmar May 07 '15 at 23:27
  • True, it'll be more difficult to do it iteratively, but since there doesn't appear to be a way to bypass the stack limit for recursive reads, I don't see a way to parse it correctly in a recursive fashion. The parser I'm looking for could possibly be tail-recursive, though. – cosmicexplorer May 07 '15 at 23:47
  • You certainly can write a custom parser like that, especially if your input is not as general as `read` is required to handle. But there's nothing built-in that works that way. – Barmar May 07 '15 at 23:49
  • Probably nothing builtin, yeah. I was looking for some possibly little-known external library, since parsing deeply nested sexps seems like an application that someone else would have tackled already (applications to logging structures, serialization, etc). Come to think of it, if I do write a reader like that I might as well put it into a separate library. – cosmicexplorer May 07 '15 at 23:54
  • Requests for third-party libraries are off-topic for SO. – Barmar May 07 '15 at 23:55
  • I admit I'm finding it hard to believe that SBCL has such a limited recursion depth, although maybe it's because of how many nested functions are used in each call to `read`. But are you saying that simply reading `((((((((((...x...))))))))))` (where there are 1,000 parentheses) will fail? – Barmar May 07 '15 at 23:56
  • Yes; please refer to the edit made to the question above. – cosmicexplorer May 08 '15 at 00:07
  • Yeah, I just installed SBCL here. The limit is somewhere between 1800 and 2000. – Barmar May 08 '15 at 00:09
  • 1
    So..... why SBCL instead of a Common Lisp that actually works for this purpose???? Your simple example is handled easily by both GNU CLISP and Embeddable Common Lisp, for example. https://common-lisp.net/~dlw/LispSurvey.html Also, you might try searching for other posts with similar problems that might have useful answers, such as this one: http://stackoverflow.com/a/19338030/816536 – Greg A. Woods May 08 '15 at 02:31
  • I had been using sbcl previously and saw no reason to switch. SBCL provides one of the easier ways to dump an executable image, which was nice. CLISP still chokes on the input file I'm trying to work with, unfortunately. I stated above in the question that I had already attempted increasing the stack size, to no effect. – cosmicexplorer May 08 '15 at 08:29

1 Answers1

1

I don't know what you did (because you didn't show it exactly), but when I start sbcl as follows your example works fine for me:

sbcl --control-stack-size 100

Of course I recommended GNU CLISP and Embedded Common Lisp as they also work A-OK for your example.

I'll add a reference to this answer for future readers: https://stackoverflow.com/a/9002973/816536

I'll also mention that compiling the code with appropriate optimization options may be necessary in many CL implementations to benefit from tail-call optimization.

Community
  • 1
  • 1
Greg A. Woods
  • 2,663
  • 29
  • 26
  • Right, the example at the bottom of the question was trivial. As stated at the top of the question, the testing I'm doing is based in the linked github repo. I've added some more explanation at the bottom in case it wasn't clear. I tried CLISP, and it didn't die like sbcl, but it also didn't manage to complete within a half hour (it wasn't hanging, I checked), which is not ideal. I've had to change my entire strategy because even if read doesn't blow up the stack, it's still recursive, and therefore extremely slow on large inputs. – cosmicexplorer May 15 '15 at 19:00
  • Well now, there you go! If you ask the interpreter to do effectively unbounded recursion with too much data then you effectively invoke "undefined behaviour", which as they say can mean anything, including making daemons fly out of your nose. If you can't do it with a predictable preset stack limit, or a tail-recursive algorithm, then you'll have to do it iteratively and still keep an eye on your heap use. Better yet, fix your data representation! – Greg A. Woods May 15 '15 at 19:08
  • Yeah, the only reason I asked the question was because I thought it was strange that `(read)` was (non tail-)recursive when it's such a basic function and recursion is less efficient. I ended up "fixing the data representation" of the AST I was parsing by avoiding bouncing it to text at all and working directly in C++ without involving lisp at all. – cosmicexplorer May 15 '15 at 22:32