0

I am trying to read sensor measurements (published from another device) with MQTT and store the reads of a week in a pandas DataFrame, once such dataframe is empty I would like to save it to a .csv file and start filling a new empty dataframe. An example of such dataframe is as follows:

                           sensor1  ...  sensorxx
timestamp                           ...                   
2018-11-21 15:15:00-06      0.276   ...   0
2018-11-21 15:30:00-06      0.167   ...   0
2018-11-21 15:45:00-06      0.179   ...   0.1
2018-11-21 16:00:00-06      0.076   ...   0.2
2018-11-21 16:15:00-06      0.064   ...   0

My code works exactly as I intend it to work, only to fail (doesn't really fail, keeps running without any error message) after a while (hundreds of messages) as if the messages were not flowing in anymore (which they are).

All of this happens within a class, here is a simplified version of my code

import pandas as pd
import json
import paho.mqtt.client as mqtt

global bufferDF = None
global counter = 1

class DataSaver():
    def __init__(self,filesfolderpath,sensorslist):
        self.filesfolderpath = filesfolderpath
        self.sensorslist = sensorslist
        self.client = None

    def SaveSensorRead(self, client, userdata, message):
        global bufferDF
        global counter

        message_dict = json.loads(message)
        timestamp = pd.to_datetime(message_dict["timestamp"]) #timestamp message payload
        sensorname = message_dict["sensorname"]
        read = message_dict["read"]

        # creates an empty dataframe over a weekly daterange containg current timestamp 
        #(only for the first call when bufferDF has never been initialized)
        if (bufferDF is None):
            daterange = InitDateRange(timestamp) 
            bufferDF = pd.DataFrame(index=daterange, columns=self.sensorslist)
        
        # checks wether bufferDF is full, if so saves to disk and initializes new one
        if (timestamp > max(bufferDF.index)):
            filename = "week"+str(counter)+".csv"
            bufferDF.to_csv(os.path.join(self.filesfolderpath,filename))
            daterange = InitDateRange(timestamp) 
            bufferDF = pd.DataFrame(index=daterange, columns=self.sensorslist)
            counter += 1
            
        bufferDF.loc[timestamp,sensorname] = read

   def InitComm(self, brokerip, channelname)
       self.client = mqtt.Client("client")
       self.client.on_message = self.SaveSensorRead
       self.client.connect(brokerip,1883)
       self.client.loop_start()
       self.client.subscribe(channelname)


saver = DataSaver(filesfolderpath,sensorslist)
saver.InitComm(brokerip, channelname)

Tried several things. Saving the dataframe at every iteration I could see it gets initialized with the proper structure and filled properly. Tried reducing the frequency of the publisher of the data to several seconds for the subscriber to keep up as suggested here and increasing the quality of service parameter but didn't work.

It's like some memory fills up and my client can't process any more of such messages after a while. One of the weekly files I am trying to save is about 1.5 MB so not really a RAM problem. Trying to see Paho documentation for a "cache" parameter to tune but can't seem to find it.

I could of course reduce the size of the DF so to have it filled by fewer message but doesn't work for me going forward.

Any help is much appreciated!

mccoy89
  • 21
  • 6

0 Answers0