0

So I have a project idea that requires me to process incoming realtime data and constantly track some metrics about the realtime data. Then every now and then I want to be able to request for the metrics I am calculating and do some stuff with that data.

Currently I have a simple Python script that uses the socket library to get the realtime data. It is basically just...

metric1 = 0
metric2 = ''

while True:
    response = socket.recv(512).decode('utf-8')

    if response.startswith('PING'):
        sock.send("PONG\n".encode('utf-8'))
    else:
        process(response)

In the above process(response) will update metric1 and metric2 with data from each response. (For example they might be mean len(response) and most common response respectively)

What I want to do is run the above script constantly after starting up the project and occasionally query for metric1 and metric2 in a script I have running locally. I am guessing that I will have to look into running code on a server which I have very little experience with.

What are the most accessible tools to do what I want? I am pretty comfortable with a variety of languages so if there is a library or tool in another language that is better suited for all of this, please tell me about it

Thanks!

bkarthik
  • 119
  • 5

1 Answers1

0

I worked on a similar project, not sure if it specifically can be applied to your case, but maybe it can give you a starting point.

Although I am very aware it's not best practice to use Pandas Dataframes for real-time purposes, in my case it's just fast enough (I am actually open to suggestions on how to improve my workflow!), here is my code:

all_prices = pd.Dataframe()

readprice():
 global all_prices
 
 msg = mysock.recv(16384)
    msg_stringa=str(msg,'utf-8')
    
    new_price = pd.read_csv(StringIO(msg_stringa) , sep=";", error_bad_lines=False, 
                    index_col=None, header=None, engine='c', names=range(33),
                    decimal = '.')

...
...
all_prices = all_prices.append(new_price, ignore_index=True).copy()

So 'all_prices' Pandas Dataframe is global, new prices get appended to the general 'all_prices' DF . This global DF can be used by other functions in order to read the content ect. Be very careful about the variable sharing between two or more threads, it can lead to errors. More info here: http://www.laurentluce.com/posts/python-threads-synchronization-locks-rlocks-semaphores-conditions-events-and-queues/

In my case, I don't share the DF to a parallel thread, other threads are launched after the append, not in the meantime.

Dharman
  • 30,962
  • 25
  • 85
  • 135
Lorenzo Bassetti
  • 795
  • 10
  • 15
  • I forgot to mention of course that with a different function, you can read the global DF created by the 'readprice()' function. As long as you write to a global with just one function/thread per time, and the other functions read only, you will not have problems. – Lorenzo Bassetti Apr 03 '21 at 15:25