In a blog post from not too long ago, Scott Vokes describes a technical problem associated to lua's implementation of coroutines using the C functions setjmp
and longjmp
:
The main limitation of Lua coroutines is that, since they are implemented with setjmp(3) and longjmp(3), you cannot use them to call from Lua into C code that calls back into Lua that calls back into C, because the nested longjmp will clobber the C function’s stack frames. (This is detected at runtime, rather than failing silently.)
I haven’t found this to be a problem in practice, and I’m not aware of any way to fix it without damaging Lua’s portability, one of my favorite things about Lua — it will run on literally anything with an ANSI C compiler and a modest amount of space. Using Lua means I can travel light. :)
I have used coroutines a fair amount and I thought I understood broadly what was going on and what setjmp
and longjmp
do, however I read this at some point and realized that I didn't really understand it. To try to figure it out, I tried to make a program that I thought should cause a problem based on the description, and instead it seems to work fine.
However there are a few other places that I've seen people seem to allege that there are problems:
The question is:
- Under what circumstances do lua coroutines fail to work because of C function stack frames getting clobbered?
- What exactly is the result? Does "detected at runtime" mean, lua panic? Or something else?
- Does this still affect the most recent versions of lua (5.3) or is this actually a 5.1 issue or something?
Here was the code which I produced. In my test, it is linked with lua 5.3.1, compiled as C code, and the test itself is compiled itself as C++ code at C++11 standard.
extern "C" {
#include <lauxlib.h>
#include <lua.h>
}
#include <cassert>
#include <iostream>
#define CODE(C) \
case C: { \
std::cout << "When returning to " << where << " got code '" #C "'" << std::endl; \
break; \
}
void handle_resume_code(int code, const char * where) {
switch (code) {
CODE(LUA_OK)
CODE(LUA_YIELD)
CODE(LUA_ERRRUN)
CODE(LUA_ERRMEM)
CODE(LUA_ERRERR)
default:
std::cout << "An unknown error code in " << where << std::endl;
}
}
int trivial(lua_State *, int, lua_KContext) {
std::cout << "Called continuation function" << std::endl;
return 0;
}
int f(lua_State * L) {
std::cout << "Called function 'f'" << std::endl;
return 0;
}
int g(lua_State * L) {
std::cout << "Called function 'g'" << std::endl;
lua_State * T = lua_newthread(L);
lua_getglobal(T, "f");
handle_resume_code(lua_resume(T, L, 0), __func__);
return lua_yieldk(L, 0, 0, trivial);
}
int h(lua_State * L) {
std::cout << "Called function 'h'" << std::endl;
lua_State * T = lua_newthread(L);
lua_getglobal(T, "g");
handle_resume_code(lua_resume(T, L, 0), __func__);
return lua_yieldk(L, 0, 0, trivial);
}
int main () {
std::cout << "Starting:" << std::endl;
lua_State * L = luaL_newstate();
// init
{
lua_pushcfunction(L, f);
lua_setglobal(L, "f");
lua_pushcfunction(L, g);
lua_setglobal(L, "g");
lua_pushcfunction(L, h);
lua_setglobal(L, "h");
}
assert(lua_gettop(L) == 0);
// Some action
{
lua_State * T = lua_newthread(L);
lua_getglobal(T, "h");
handle_resume_code(lua_resume(T, nullptr, 0), __func__);
}
lua_close(L);
std::cout << "Bye! :-)" << std::endl;
}
The output I get is:
Starting:
Called function 'h'
Called function 'g'
Called function 'f'
When returning to g got code 'LUA_OK'
When returning to h got code 'LUA_YIELD'
When returning to main got code 'LUA_YIELD'
Bye! :-)
Much thanks to @ Nicol Bolas for the very detailed answer!
After reading his answer, reading the official docs, reading some emails and playing around with it some more, I want to refine the question / ask a specific follow-up question, however you want to look at it.
I think this term 'clobbering' is not good for describing this issue and this was part of what confused me -- nothing is being "clobbered" in the sense of being written to twice and the first value being lost, the issue is solely, as @Nicol Bolas points out, that longjmp
tosses part of the C stack, and if you are hoping to restore the stack later, too bad.
The issue is actually described very nicely in section 4.7 of lua 5.2 manual, in a link provided by @Nicol Bolas.
Curiously, there is no equivalent section in the lua 5.1 documentation. However, lua 5.2 has this to say about lua_yieldk
:
Yields a coroutine.
This function should only be called as the return expression of a C function, as follows:
return lua_yieldk (L, n, i, k);
Lua 5.1 manual says something similar, about lua_yield
instead:
Yields a coroutine.
This function should only be called as the return expression of a C function, as follows:
return lua_yieldk (L, n, i, k);
Some natural questions then:
- Why does it matter if I use
return
here or not? Iflua_yieldk
will calllongjmp
then thelua_yieldk
will never return anyways, so it shouldn't matter if I return then? So that cannot be what is happening, right? - Supposing instead that
lua_yieldk
just makes a note within the lua state that the current C api call has stated that it wants to yield, and then when it finally does return, lua will figure out what happens next. Then this solves the problem of saving C stack frames, no? Since after we return to lua normally, those stack frames have expired anyways -- so the complications described in @Nicol Bolas picture are skirted around? And second of all, in 5.2 at least the semantics are never that we should restore C stack frames, it seems --lua_yieldk
resumes to a continuation function, not to thelua_yieldk
caller, andlua_yield
apparently resumes to the caller of the current api call, not to thelua_yield
caller itself.
And, the most important question:
If I consistently use
lua_yieldk
in the formreturn lua_yieldk(...)
specified in the docs, returning from alua_CFunction
that was passed to lua, is it still possible to trigger theattempt to yield across a C-call boundary
error?
Finally, (but this is less important), I would like to see a concrete example of what it looks like when a naive programmer "isn't careful" and triggers the attempt to yield across a C-call boundary
error. I get the idea that there could be problem associated to setjmp
and longjmp
tossing stack frames that we later need, but I want to see some real lua / lua c api code that I can point to and say "for instance, don't do that", and this is surprisingly elusive.
I found this email where someone reported this error with some lua 5.1 code, and I attempted to reproduce it in lua 5.3. However what I found was that, this looks like just poor error reporting from the lua implementation -- the actual bug is being caused because the user is not setting up their coroutine properly. The proper way to load the coroutine is, create the thread, push a function onto the thread stack, and then call lua_resume
on the thread state. Instead the user was using dofile
on the thread stack, which executes the function there after loading it, rather than resuming it. So it is effectively yield outside of a coroutine
iiuc, and when I patch this, his code works fine, using both lua_yield
and lua_yieldk
in lua 5.3.
Here is the listing I produced:
#include <cassert>
#include <cstdio>
extern "C" {
#include "lua.h"
#include "lauxlib.h"
}
//#define USE_YIELDK
bool running = true;
int lua_print(lua_State * L) {
if (lua_gettop(L)) {
printf("lua: %s\n", lua_tostring(L, -1));
}
return 0;
}
int lua_finish(lua_State *L) {
running = false;
printf("%s called\n", __func__);
return 0;
}
int trivial(lua_State *, int, lua_KContext) {
printf("%s called\n", __func__);
return 0;
}
int lua_sleep(lua_State *L) {
printf("%s called\n", __func__);
#ifdef USE_YIELDK
printf("Calling lua_yieldk\n");
return lua_yieldk(L, 0, 0, trivial);
#else
printf("Calling lua_yield\n");
return lua_yield(L, 0);
#endif
}
const char * loop_lua =
"print(\"loop.lua\")\n"
"\n"
"local i = 0\n"
"while true do\n"
" print(\"lua_loop iteration\")\n"
" sleep()\n"
"\n"
" i = i + 1\n"
" if i == 4 then\n"
" break\n"
" end\n"
"end\n"
"\n"
"finish()\n";
int main() {
lua_State * L = luaL_newstate();
lua_pushcfunction(L, lua_print);
lua_setglobal(L, "print");
lua_pushcfunction(L, lua_sleep);
lua_setglobal(L, "sleep");
lua_pushcfunction(L, lua_finish);
lua_setglobal(L, "finish");
lua_State* cL = lua_newthread(L);
assert(LUA_OK == luaL_loadstring(cL, loop_lua));
/*{
int result = lua_pcall(cL, 0, 0, 0);
if (result != LUA_OK) {
printf("%s error: %s\n", result == LUA_ERRRUN ? "Runtime" : "Unknown", lua_tostring(cL, -1));
return 1;
}
}*/
// ^ This pcall (predictably) causes an error -- if we try to execute the
// script, it is going to call things that attempt to yield, but we did not
// start the script with lua_resume, we started it with pcall, so it's not
// okay to yield.
// The reported error is "attempt to yield across a C-call boundary", but what
// is really happening is just "yield from outside a coroutine" I suppose...
while (running) {
int status;
printf("Waking up coroutine\n");
status = lua_resume(cL, L, 0);
if (status == LUA_YIELD) {
printf("coroutine yielding\n");
} else {
running = false; // you can't try to resume if it didn't yield
if (status == LUA_ERRRUN) {
printf("Runtime error: %s\n", lua_isstring(cL, -1) ? lua_tostring(cL, -1) : "(unknown)" );
lua_pop(cL, -1);
break;
} else if (status == LUA_OK) {
printf("coroutine finished\n");
} else {
printf("Unknown error\n");
}
}
}
lua_close(L);
printf("Bye! :-)\n");
return 0;
}
Here is the output when USE_YIELDK
is commented out:
Waking up coroutine
lua: loop.lua
lua: lua_loop iteration
lua_sleep called
Calling lua_yield
coroutine yielding
Waking up coroutine
lua: lua_loop iteration
lua_sleep called
Calling lua_yield
coroutine yielding
Waking up coroutine
lua: lua_loop iteration
lua_sleep called
Calling lua_yield
coroutine yielding
Waking up coroutine
lua: lua_loop iteration
lua_sleep called
Calling lua_yield
coroutine yielding
Waking up coroutine
lua_finish called
coroutine finished
Bye! :-)
Here is the output when USE_YIELDK
is defined:
Waking up coroutine
lua: loop.lua
lua: lua_loop iteration
lua_sleep called
Calling lua_yieldk
coroutine yielding
Waking up coroutine
trivial called
lua: lua_loop iteration
lua_sleep called
Calling lua_yieldk
coroutine yielding
Waking up coroutine
trivial called
lua: lua_loop iteration
lua_sleep called
Calling lua_yieldk
coroutine yielding
Waking up coroutine
trivial called
lua: lua_loop iteration
lua_sleep called
Calling lua_yieldk
coroutine yielding
Waking up coroutine
trivial called
lua_finish called
coroutine finished
Bye! :-)