0

I'm trying to create a small home monitoring system. I have a series of wireless transmitters that transmit measurement data to a base station. I can query that base station using Modbus RTU to find out the latest measurement values from each transmitter.

To store the measurements and visualize, I'm using InfluxDB and Grafana. I have everything running on Raspberry Pi Model 3B+, including the RS-485 communication to the base station.

I have chosen to use Python to read the data from Modbus RTU and then forward it to InfluxDB for storage because Python has ready-made libraries for both. However, I'm struggling to get the Python script stable. Inevitably I get CRC errors in the Modbus transmission every now and then and the script seems to get stuck when minimalmodbus library raises one of these exceptions.

I'm not sure how I should tackle this problem.

At the moment I'm using try-except-else structure, but because I'm a complete newbie in Python I can't get it to work the way I want. It's okay if I lose a single measurement point. This means that if I get a CRC error, I can just forget about that measurement and carry on like nothing ever happened.

The code (minimalized) that I'm using at the moment looks like this:

#!/user/bin/env python
import minimalmodbus
import time
from influxdb import InfluxDBClient

# influxdb stuff
influx = InfluxDBClient(host='localhost', port=8086)
influx.switch_database('dbname')

# minimalmodbus stuff
minimalmodbus.BAUDRATE = 9600
instrument = minimalmodbus.Instrument('/dev/ttyUSB0', 1)

errorcounter = 0
cyclecounter = 0

while True:

        try:

                sid1te = instrument.read_register(247, 1, 4)
                print "SID 1 TE:", sid1te

                influxquery = [
                        {"measurement": "sid1", "fields": { "te": sid1te}},
                        {"measurement": "system", "fields": { "errorcounter": errorcounter}},
                        {"measurement": "system", "fields": { "cyclecounter": cyclecounter}}
                ]

                print "InfluxDB query result:", influx.write_points(influxquery)

        except Exception as error:
                print "[!] Exception occurred: ", error
                errorcounter = errorcounter + 1

        else:
                print "[i] One cycle completed."
                cyclecounter = cyclecounter + 1

        time.sleep(30)

What ends up happening is that the script can run for hours like a dream, and then, when a single CRC error occurs in the transmission it enters a never ending loop of exceptions like this:

[!] Exception occurred:  Checksum error in rtu mode: '\xeb\xf9' instead of 'p\x97' . The response is: '\x7f\x01\x04\x02\x00\xeb\xf9' (plain response: '\x7f\x01\x04\x02\x00\xeb\xf9')
[!] Exception occurred:  Checksum error in rtu mode: '\xeb\xf9' instead of 'p\x97' . The response is: '\x7f\x01\x04\x02\x00\xeb\xf9' (plain response: '\x7f\x01\x04\x02\x00\xeb\xf9')
[!] Exception occurred:  Checksum error in rtu mode: '\xeb\xf9' instead of 'p\x97' . The response is: '\x7f\x01\x04\x02\x00\xeb\xf9' (plain response: '\x7f\x01\x04\x02\x00\xeb\xf9')
[!] Exception occurred:  Checksum error in rtu mode: '\xeb\xf9' instead of 'p\x97' . The response is: '\x7f\x01\x04\x02\x00\xeb\xf9' (plain response: '\x7f\x01\x04\x02\x00\xeb\xf9')
[!] Exception occurred:  Checksum error in rtu mode: '\xeb\xf9' instead of 'p\x97' . The response is: '\x7f\x01\x04\x02\x00\xeb\xf9' (plain response: '\x7f\x01\x04\x02\x00\xeb\xf9')
[!] Exception occurred:  Checksum error in rtu mode: '\xeb\xf9' instead of 'p\x97' . The response is: '\x7f\x01\x04\x02\x00\xeb\xf9' (plain response: '\x7f\x01\x04\x02\x00\xeb\xf9')

When I back out of this using CTRL-C the script actually looks to be in the sleep command:

^CTraceback (most recent call last):
  File "temp.py", line 92, in <module>
    time.sleep(30)
KeyboardInterrupt

So I'm puzzled as to why it's not outputting normal print commands to the console if it's actually in the program loop.

In the actual script I have three dozen instrument.read_register calls, so I'm not sure if I should make a distinct function where I handle exception on per-read_register call or what? I've tried half a dozen variations of this code over the past week but the data I get in Grafana is just abysmal due to the script getting stuck in exception loops.

Any suggestions?

entropiae
  • 9
  • 4
  • Is the Modbus link wired? if so, how are you connecting the cables and what are you using on your Pi for the RS485 link (USB adaptor, brand, model,...)? Are you using a reliable power supply for your Pi and base station? Do those share the same GND through the mains or do you have an additional GND cable? Any other sources of noise that you know of? – Marcos G. Aug 06 '19 at 09:56
  • Yes, it's wired. I'm using a complete system from the Finnish building automation company Produal. It's their FLTA base station and their MPCC Modbus configuration tool. It plugs directly into the Pi's USB and powers the FLTA base station and also provides the RS-485 link. This exact same setup worked like a charm for over six months using Home Assistant, so I'm confident this is a sofware issue with my Python script. – entropiae Aug 06 '19 at 13:01
  • I'm not sure if CRC errors were occurring with Home Assistant, but in any case, I should be able to recover safely and sanely from any and all errors in the bus communication. – entropiae Aug 06 '19 at 13:06
  • Sorry I did not read your question carefully. Of course, it is expected that you get CRC errors from time to time. I don't see why an error should trigger that reaction though. It looks like a bug. Does it happen if you remove the try/except on your code? I'm not completely sure but I think minimalmodbus is tolerant to CRC errors on its own – Marcos G. Aug 06 '19 at 14:26
  • By the way, you can force CRC errors with minimalmodbus, look at the tests. You can also do that with ModbusPoll but that's not free (there used to be a demo) – Marcos G. Aug 06 '19 at 14:30
  • Without the try/except the script execution will stop with an IOError exception if I get a CRC error. I rewrote the script some, basically moved the individual read_register-calls to a function that does try/except for all calls individually and that has been running for an hour now, no problems so far. I will leave it running and observe. – entropiae Aug 06 '19 at 14:31

2 Answers2

1

I have now released a new version of MinimalModbus which by default flushes the input and output serial buffers before doing any communication on the serial bus, in order to clean up any old errors. Please give it a try.

Disclaimer: I am the maintainer of MinimalModbus.

jonasberg
  • 1,835
  • 2
  • 14
  • 14
  • Thanks! I will give it a shot and report back. – entropiae Aug 11 '19 at 15:39
  • I'm still getting around six hours of CRC errors in one go (querying every 20 seconds, with 0.5 second delay between queries). I'm pretty confident this problem is not with Python at this stage. Must do some lower level debugging. Time to bust out the oscilloscope... – entropiae Aug 12 '19 at 17:06
  • I suggest that you use debug mode to study what is sent to/from your instrument. I have written about it on https://minimalmodbus.readthedocs.io/en/stable/debugmode.html – jonasberg Aug 14 '19 at 20:08
0

I tried this with different USB/RS-485 converter as well as different Modbus RTU gear. Same problem persists. I'm 99% confident the problem is with Linux serial port handling. I'm not sure how/why, but it sometimes gives pyserial/minimalmodbus incomplete byte sequences, causing minimalmodbus to evaluate erroneus sequences as responses. Example is that minimalmodbus complains that 0x11 is too short a response, and then immediately after that minimalmodbus complains that 0x03 0x06 0xAE 0x41 0x56 0x52 0x43 0x40 0x49 0xAD has error in CRC. In reality, if these two messages were received as one, it would be a perfectly valid response.

I do not know how to navigate this problem, but I'm pretty sure the problem is deeper than Python level.

EDIT: It is not a hardware/Linux problem, but instead with Python/pyserial/minimalmodbus. I hacked together a Python script which executes an external C language Modbus RTU query program and parses the output. Works like a charm 100% of the time, for more than a week now.

entropiae
  • 9
  • 4