2

I have a small Python server which I can POST commands to in order to control my LIFX lights. From Postman I can spam this as much as I like and never see an error, but what I'm trying to do is build a couple of wall switches that trigger the lights using NodeMCU boards, and from there, I'm getting ECONNABORTED errors on about 1 in 5 requests.

Everywhere I've looked for solutions the issue has actually been either a misconfigured server or a misconfigured client, but I'm wondering if I've got something else going on here. My server code is simple, and, as I say, it seems to work when triggered from everywhere but my NodeMCU boards.

main.py:

from machine import Pin, reset
from time import sleep
import urequests

# set these two pins as required to up/down
buttonUp = Pin(4, Pin.IN, Pin.PULL_UP)
buttonDown = Pin(5, Pin.IN, Pin.PULL_UP)
light = "LightName"

# button can be pressed, held or double pressed

# press = 1x press, 1x release in .5 seconds
# hold = 1x press, 0x release
# double = 2x press, 2x release in .5 seconds

def detectPress():
    pressed = False
    press = 0
    direction = 'up'
    release = 0
    if not buttonUp.value() or not buttonDown.value:
        pressed = True
    if not buttonDown.value():
        direction = 'down'

    while buttonUp.value() and buttonDown.value():
        sleep(.01)  # wait for a button push
    for x in range(8):
        if pressed == False:
            if not buttonUp.value():
                direction = 'up'
                pressed = True
                press += 1
            if not buttonDown.value():
                direction = 'down'
                pressed = True
                press += 1
        else:
            if direction == 'up':
                if buttonUp.value():
                    pressed = False
                    release += 1
            else:
                if buttonDown.value():
                    pressed = False
                    release += 1
        sleep(.1)
    return press, release, direction

error_count = 0
while True:
    if error_count >= 5:
        print ("Too many errors. Resetting...")
        reset()
    pressed, released, direction = detectPress()
    sleep_time = .1
    if pressed >= released:
        packet = {"light": light}
        if pressed == released:
            if pressed == 0:
                held = True
            else:
                held = False
        else:
            held = True
        if pressed > 1:
            double = True
        else:
            double = False
        if double is True:
            packet["level"] = "full"
        if held is True and double is False:
            packet["dim"] = direction
            sleep_time = 0.8  # don't spam the server/crash the board
        if held is False and double is False:
            if direction == 'up':
                packet["level"] = 'on'
            else:
                packet["level"] = 'off'
        print (pressed, released, direction, held, double, packet)
        try:
            response = urequests.post("http://192.168.1.10:7990/lights", headers={'Connection': 'Close'}, json = packet)
            if error_count > 0:
                error_count -= 1
                urequests.usocket.reset()
        except Exception as e:
            error_count += 1
            print ("Error sending packet {}: {} - error count is at {} retrying...".format(packet, repr(e), error_count))
            urequests.usocket.reset()
            sleep(1)
            try:
                response = urequests.post("http://192.168.1.10:7990/lights", headers={'connection': 'Close'}, json = packet)
            except Exception as e:
                error_count += 1
                print ("retry failed")
                pass
            pass
    print ("waiting {}".format(sleep_time))
    sleep(sleep_time)

I've a suspicion that it's a socket issue, but have no idea what else to do to debug that.

On a fresh reset, I can pretty much guarantee the first 4 or 5 transmissions will work. I can also pretty much guarantee that holding a button (to trigger a command every second or so) will fail after 3 or 4 transmissions.

Sometimes retries work, more often they don't.

Most of the time after a failure, waiting 5 seconds and then trying again will work, but sometimes it won't.

Most of the time an initial press after a long delay (>1 minute) will work, but sometimes it won't.

jymbob
  • 478
  • 5
  • 16

2 Answers2

2

After trying everything I could think of to fix this, I've come to the conclusion that the issue is the ESP8266 failing to process the initial handshake from the server. I believe it may just not be able to manage the volume of traffic on the network, so it seizes up. Running wireshark on the server I see several TCP Spurious Retransmissions when I get a failed response.

I've got an ESP32 board on order, which should let me test my theory.

Edit: I finally worked out that the issue was that urequests was leaving sockets open all over the place. Therefore we need to close the response properly:

response = urequests.post(url, ...)

... 

response.close()

This solved my problem, although what I can't explain is why the sockets weren't closing at the end of the post, which appears to be how the library functions.

jymbob
  • 478
  • 5
  • 16
  • So I just started trying to do the same thing. I agree, postman works 100% no matter how fast I spam it. But I get this same failure on nodemcu. I see others that have the same thing like this one on the upython site: https://forum.micropython.org/viewtopic.php?t=5094 But that one devolved into silly discussion about sharing keys instead of solving the issue. :D So did you ever come up with any solution on this jymbob? – Eradicatore Mar 09 '19 at 14:09
  • Actually yes. I believe the underlying issue is that the request is never properly closed, so eventually the board has too many open connections (I'm no expert on the inner workings of websockets, so my terminology may be way off). If you write `response = urequests.post(...` you can then call `response.close()` as part of your cleanup, which is much more stable. – jymbob Mar 12 '19 at 12:03
1

Ok, garbage collection solved it. Now I can press the button and it works every time! I may tweak the delays to get as tight as possible. Would be nice to query the urequests if busy instead of just a crude delay, but hey...

Here's the my loop that sends out the requests:

gc.enable()

while True:
    time.sleep_ms(250)

    if (btn1.value() == 0):
        urequests.post(url, data=json.dumps(data_play))
        time.sleep_ms(650)
        while (btn1.value() == 0):
            time.sleep_ms(200)
            pass
        gc.collect()
Eradicatore
  • 1,501
  • 2
  • 20
  • 38
  • Thanks for this. I ended up going a different route, but I'll give this a try. The solution I found was to realise `urequests.post()` returns a response, so `response = urequests.post(url)` followed by `response.close()` appears to handle the cleanup. It may work for you. – jymbob Mar 12 '19 at 12:07
  • Thanks for getting back to me on this solution! I believe that's the real answer here, and my work around was just a hack way of doing the same things since response got out of scope. But my way is way more inefficient. Cool! – Eradicatore Mar 13 '19 at 18:27
  • NP. I've added my solution to my answer. If it works for you, I'd appreciate the upvote! – jymbob Mar 13 '19 at 21:35