0

I have a crossbar router running for pubsub. I have used some sample code for a publish client and added some additional code to do a constant publish message every few seconds. I use the twisted function reactor.callLater to force a delay before the loop is started again.

I stop the router to test auto-reconnection of the publish client.

Whether the function is successful or fails it will always call the function to run again. So, once the router is stopped, my publish routine that runs every few seconds starts to fail on connection transport lost.

The auto-reconnection routine starts trying to reconnect and obviously fails for each reconnect attempt. . I restart the router and the reconnection is made and the join function is called with a new session id. However, the previous disconnected session value is still active somehow and the loop runs with the old session id and the failure routine is called for each failure with the old session id. The new session happily works in parallel. So somehow I end up with 2 sessions trying to publish - one successful one failing.

Here is the code somewhat modified from the example on autobahn examples

typfrom autobahn.twisted.component import Component, run
from autobahn.twisted.util import sleep
from twisted.internet.defer import inlineCallbacks
from twisted.internet import reactor
from twisted.internet import defer
from autobahn.wamp.types import PublishOptions
import txaio
import os
import argparse
import six
import treq

txaio.use_twisted()
log = txaio.make_logger()
txaio.start_logging()
#txaio.start_logging(out="jahlog.log", level='info')
url = os.environ.get('CBURL', u'ws://localhost:8080/ws')
realmv = os.environ.get('CBREALM', u'realm1')
topic = os.environ.get('CBTOPIC', 'com.myapp.hello')
topic =  'com.myapp.hello'
print(url, realmv)
component = Component(transports=url, realm=realmv)


    #def on_join(self, fn):
    #    """
    #    A decorator as a shortcut for listening for 'join' events.#
    #
    #    For example::
    #
    #       @component.on_join
    #       def joined(session, details):
    #           print("Session {} joined: {}".format(session, details))
    #    """
    #    self.on('join', fn)
    #
    #def on_leave(self, fn):
    #    """
    #    A decorator as a shortcut for listening for 'leave' events.
    #    """
    #    self.on('leave', fn)
    #
    #def on_connect(self, fn):
    #    """
    #    A decorator as a shortcut for listening for 'connect' events.
    #    """
    #    self.on('connect', fn)
    #
    #def on_disconnect(self, fn):
    #    """
    #    A decorator as a shortcut for listening for 'disconnect' events.
    #    """
    #    self.on('disconnect', fn)
    #
    #def on_ready(self, fn):
    #    """
    #    A decorator as a shortcut for listening for 'ready' events.
    #    """
    #    self.on('ready', fn)
    #
    #def on_connectfailure(self, fn):
    #    """
    #    A decorator as a shortcut for listening for 'connectfailure' events.
    #    """
    #    self.on('connectfailure', fn)




@component.on_leave
@inlineCallbacks
def left(session):
    print('session left', session)
    yield session.leave()

@component.on_disconnect
@inlineCallbacks
def gone(session):
    print('session disconnect', session)
    yield session.leave()

@component.on_join
def joined(session, details):
    print("session ready", session)
    sessf = dir(session)
    #print('SESSFFF', sessf)
    startup_system(session)


def finished_pub(res, session):
    print('FINSIED PUB', res)
    #session.leave()
    #session.disconnect()
    reactor.callLater(5, startup_system, session)
     
def failed_pub(res, session):
    print('Failed Pub', res)
    #session.leave()
    reactor.callLater(5, startup_system, session)
 
def startup_system(session):
    """ doc """
    print('START UP SYSTEM', session) 
    d = run_system(session)
    d.addCallback(finished_pub, session)
    d.addErrback(failed_pub, session)
         
@inlineCallbacks
def run_system(session):
    print('START RUN', session)
    sessdir = dir(session)
    #print('SESSDIR', sessdir)
    driver_data = yield prepare_driver_data(session)
    print('driver_data after prepare return') #, driver_data[0][0]) #, driver_data)
    drive_resp = yield publish_driver_data(session, driver_data)
    print('DRIVER PUBLISH SYSTEM GOT DATA', drive_resp)
    return 'OK'

@inlineCallbacks
def prepare_driver_data(session):
    """ doc """
    print('Prepare Drivers data')
    reverse_locations = []
    job_no_saved = None
    driver_url =  b'http://192.168.1.196:8084/loaddrivers'
    headers = {'Content-type': 'application/json'}
    url = driver_url
    driver_data = yield treq.post(url, headers=headers, timeout=30)
    content = yield treq.json_content(driver_data)
    print('DRIVEDR RES: ', content[0][0]) #, content)
    return content

@inlineCallbacks
def publish_driver_data(session, driver_data):
    print('GOT PUB SESS', session)
    print('Do logging', driver_data[0][0])
    options = PublishOptions(acknowledge=True)
    response = yield session.publish(topic, driver_data, options=options)
    print('PUB RESP', response)
    return 'OK'

if __name__ == "__main__":
    #run([component], log_level='info',stop_at_close = True)
    run([component])

It seems to me the problem lies with the twisted callLater I use when the publish call succeeds or fails.

In the function called for success or failure I use a twisted reactor.callLater function to delay the next run by a number of seconds. This does not cause a problem on success since it always keeps the same session. But on failure when the auto-reconnect is made I end up with 2 sessions - the old and the new one. The old one fails constantly. The new one is quite happy.

I can make an assumption that because the crossbar router has been stopped, any calls to session.leave() have no effect and when the reconnect is made that old session is partially still alive even though it will fail to publish anything. The response back is TransportLost.

The problem I have is that I cannot see how the old session variable is still functioning when in theory is has been overridden by the new one.

I can get round the problem by dropping callLater and using the autobahn sleep command.

But I would like to know what is going on with the sessionid because it may be hiding something fundamental that I need to know.

If anyone can throw some light on this I would be very grateful.

1 Answers1

0

You have this session interaction:

@component.on_join
def joined(session, details):
    print("session ready", session)
    sessf = dir(session)
    #print('SESSFFF', sessf)
    startup_system(session)

According to the Autobahn docs, on_join handlers "are run whenever a fresh WAMP session is started".

startup_system effectively starts your publish loop:

def startup_system(session):
    """ doc """
    print('START UP SYSTEM', session) 
    d = run_system(session)
    d.addCallback(finished_pub, session)
    d.addErrback(failed_pub, session)

run_system calls publish_driver_data and then startup_system arranges for itself to be called again after a delay no matter what the result of run_system is.

This means that even after the session is invalid and it is impossible to publish anything to it, startup_system, run_system, and publish_driver_data will keep being called for that session.

When a new session is established, joined is called again and repeats all of this setup work - creating a new loop doing this publishing work that is associated with the new session. The previous loop continues because nothing tells it to stop.

The problem I have is that I cannot see how the old session variable is still functioning when in theory is has been overridden by the new one.

A key thing to notice is that your loops are not related to "the old session variable". reactor.callLater creates state inside the reactor to make a delayed call. Nothing that happens to Autobahn or to any local variables in your application will make any difference to this. The reactor keeps its own delayed call state and makes sure those calls happen.

A reasonable solution here would be to inspect res in failed_pub. If it indicates the session has ended for any reason, failed_pub should not call reactor.callLater again - and the loop will end.

Alternatively (or additionall), since Autobahn tells you when the session has ended (by calling your gone function, registered with on_disconnect) then if you can find the state associated with the loop for the session that has disconnected, you can cancel that loop immediately (rather than waiting for the next publish to be attempted and fail). This would involve keeping a mapping from sessions to IDelayedCall values (the thing reactor.callLater returns) and then calling cancel on the IDelayedCall for the session which has just disconnected. This removes the internal reactor state associated with that call and causes it not to happen.

Jean-Paul Calderone
  • 47,755
  • 6
  • 94
  • 122
  • Thanks. Yes. I know that. What is puzzling is why the first session is somehow still active. I'm quite happy that the 2nd session has started after the reconnection. But why is the 1st session still going. The callback and errback still get calls with the old session as well as with the new session – John Aherne Aug 17 '23 at 06:36
  • Sorry what I should have said is that only the Errback is called since the old session publish always fails. The new one happily works – John Aherne Aug 17 '23 at 07:09
  • 2nd apology. What I really want to know is how do I stop the old session from carrying on. I cannot send anything to the router since it is not running. So I will have to wait for the router to start up again. And the errback will continue to have the old session to respond to – John Aherne Aug 17 '23 at 08:57
  • I am a dimwit. What I need to do is work out how to stop the old session once it starts producing errors and decide at what point I should cancel the session in some way. Looks like I will have to wait for the router to start up again and for a new session to start. Then work things from there – John Aherne Aug 17 '23 at 10:34
  • I don't think you need to wait for the router to start up again before you can address the situation. Make sure you understood the last 2 paragraphs of my answer. – Jean-Paul Calderone Aug 17 '23 at 14:38
  • Thanks. I've only just got back onto this. What you say is what I reckoned I would need to do. But looked a bit tricky to do. – John Aherne Aug 24 '23 at 15:19
  • Hit the button too quickly. I looked at the autobahn sleep command and that does a callLater to achieve what I thinking of. But I haven't had any time to see what would suit me better. Hopefully next week I can take a closer look at things and see where I get. Your last paragraph is the nub of what I needed to do and I should play around with that for a bit and see what happens. Thanks again – John Aherne Aug 24 '23 at 15:25