Interrupt during network I/O == crash?

Question

It seems that when an I/O pin interrupt occurs while network I/O is being performed, the system resets -- even if the interrupt function only declares a local variable and assigns it (essentially a do-nothing routine.) So I'm fairly certain it isn't to do with spending too much time in the interrupt function. (My actual working interrupt functions are pretty spartan, strictly increment and assign, not even any conditional logic.)

Is this a known constraint? My workaround is to disconnect the interrupt while using the network, but of course this introduces potential for data loss.

function fnCbUp(level)
    lastTrig = rtctime.get()
    gpio.trig(pin, "down", fnCbDown)
end

function fnCbDown(level)
    local spin = rtcmem.read32(20)
    spin = spin + 1
    rtcmem.write32(20, spin)
    lastTrig = rtctime.get()
    gpio.trig(pin, "up", fnCbUp)
end

gpio.trig(pin, "down", fnCbDown)
gpio.mode(pin, gpio.INT, gpio.FLOAT)

branch: master

build built on: 2016-03-15 10:39

powered by Lua 5.1.4 on SDK 1.4.0

modules: adc,bit,file,gpio,i2c,net,node,pwm,rtcfifo,rtcmem,rtctime,sntp,tmr,uart,wifi

Again, show us code and firmware branch/revision. This isn't generally a known constraint but we keep uncovering, and fixing, bugs in the original net module (up for a re-write). — Marcel Stör, Apr 18 '16 at 11:44

score 1 · Answer 1 · answered Apr 19 '16 at 19:03

Not sure if this should be an answer or a comment. May be a bit long for a comment though.

So, the question is "Is this a known constraint?" and the short but unsatisfactory answer is "no". Can't leave it like that...

Is the code excerpt enough for you to conclude the reset must occur due to something within those few lines? I doubt it. What you seem to be doing is a simple "global" increment of each GPIO 'down' with some debounce logic. However, I don't see any debounce, what am I missing? You get the time into the global lastTrig but you don't do anything with it. Just for debouncing you won't need rtctime IMO but I doubt it's got anything to do with the problem.

I have a gist of a tmr.delay-based debounce as well as one with tmr.now that is more like a throttle. You could use the first like so:

GPIO14 = 5
spin

function down()
    spin = spin + 1
    tmr.delay(50)                    -- time delay for switch debounce
    gpio.trig(GPIO14, "up", up)      -- change trigger on falling edge
end

function up()
    tmr.delay(50)
    gpio.trig(GPIO14, "down", down)  -- trigger on rising edge
end

gpio.mode(GPIO14, gpio.INT)          -- gpio.FLOAT by default
gpio.trig(GPIO14, "down", down)

I also suggest running this against the dev branch because you said it be related to network I/O during interrupts.

No I don't think the problem is in the code I posted, that was in response to your request. It's not actually a switch, it's a hall-effect sensor, so I'm not sure I need to debounce. lastTrig is a global that's utilized elsewhere in the code (testing for a certain length of inactivity.) I added print statements before and after network calls, like sntp.sync, net.dns.resolve, connection:send, etc, and also to their callbacks. The system doesn't give any indication what or where the problem is, but the print statements seem to paint a consistent picture. — Mark McGinty, Apr 19 '16 at 19:48
Ok, sorry but I didn't have anything else to work with. The high number of modules in your firmware seems to suggest that your Lua application is rather complicated (or overengineered). Can you strip it down to a create a Minimal, Complete, and Verifiable Example (MCVE) that still fails? If you have an MCVE which fails consistently on `master` AND `dev` then this may indicate a bug in the firmware which you'd then report on GitHub. — Marcel Stör, Apr 19 '16 at 20:34

score 1 · Answer 2 · edited Apr 27 '16 at 11:00

I have nearly the same problem. Running ESP8266Webserver, using GPIO14 Interrupt, with too fast Impulses as input , the system stopps recording the interrupts. Please see here for more details.

http://www.esp8266.com/viewtopic.php?f=28&t=9702

I'm using ARDUINO IDE 1.69 but the Problem seems to be the same. I used an ESP8266-07 as generator & counter (without Webserver) to generate the Pulses, wired to my ESP8266-Watersystem.

The generator works very well, with much more than 240 puls / sec, generating and counting on the same ESP.

But the ESP-Watersystem, stops recording interrupts here at impuls > 50/ second:

/*************************************************/

/*  ISR Water pulse counter                      */

/*************************************************/

/**

 * Invoked by interrupt14 once per rotation of the hall-effect sensor. Interrupt

 * handlers should be kept as small as possible so they return quickly.

 */


 void ICACHE_RAM_ATTR pulseCounter()

    {

      // Increment the pulse Counter

      cli();

      G_pulseCount++;

     Serial.println ( "!" );

     sei();

    }

The serial output is here only for showing whats happening. It shows the correct counted Impuls, until the webserver interacts with the network. Than is seams the Interrupt is blocked.(no serial output from here) By stressing the System, when I several times refresh the Website in an short time, the interrupt counting starts for an short time, but it stops short time again.

The problem is anywhere along Interrupt handling and Webservices. I hope I could help to find this issues.

Interessted in getting some solutions. Who can help?

Thanks from Mickbaer Berlin Germany Email: michael.lorenz@web.de

We are on nearly parallel paths! mine does an outbound WebAPI call to a hosted server to log its data, yours accepts an inbound web connection. Mine does serve a config page and WebAPI, served as HTTP, when ESP is placed into config mode, but that page uses AJAX to read/write config values via WebAPI, rather than posting a form to the ESP — Mark McGinty, Apr 28 '16 at 15:30
As for crashing I think I've isolated it to hardware, specifically the power supply. My circuit draws 80 mA in server mode with the sensor idle. The sensor I'm using draws max 15 mA. Logically the faster it is pulsing, the more current it draws and it may be inductive. So make sure you aren't seeing a voltage drop below 3.1v, that will crash it when doing network i/o. — Mark McGinty, Apr 28 '16 at 15:50
Note that I know nothing of your dev environment (other than knowing C++) so this might be meaningless, but I was under the impression that the underlying SDK from Espressiff is inherently event driven. So I was a bit surprised to see your code (on the ESP forum) that you were polling http status in a loop? In nodemcu I would drive it with a timer, so my main processing function returns to let async tasks complete. ("wdt reset" means your code was reset by the watchdog timer.) — Mark McGinty, Apr 28 '16 at 20:49

Interrupt during network I/O == crash?

2 Answers2