2

I'm having a problem decoding received bytes with python 3. I'm controlling an arduino via a serial connection and read it with the following code:

import serial
arduino = serial.Serial('/dev/ttyACM0', baudrate=9600, timeout=20)
print(arduino.isOpen())
myData = arduino.readline()
print(myData)

The outcome I get looks like b'\xe1\x02\xc1\x032\x82\x83\x10\x83\xb2\x80\xb0\x92\x0b\xa0' or b'\xe1\x02"\xe1\x00\x83\x92\x810\x82\xb2\x82\x91\xb2\n' and tried to decode it the usual way via myData.decode('utf-8') and I get the error UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb2 in position 1: invalid start byte. I tried other decodings (ASCII, cp437, hex, utf-16), but always face the same error.

Do you have any suggestions, how I can decode the received bytes or which decoding the arduino requires? I already tried to decode it piece by piece using a for loop, but I always face the same error message.

And is there a general way to avoid decoding problems or to find out, which decoding I have to use?

Thanks in advance.

kire
  • 95
  • 2
  • 12
  • 1
    What kind of data is the arduino sending? – Klaus D. Feb 02 '16 at 23:47
  • 3
    You need to know what the encoding is to properly decode it. Look in the documentation of whatever is sending the data, and hope they bothered to specify what those bytes are supposed to be. Then, decode the data in a `try`-`except` block, and write an error-handling function to deal with the inevitable exceptions... because you'll get garbage input sooner or later, no matter what the spec says. – Kevin J. Chase Feb 03 '16 at 00:00
  • 2
    Why do you think this should be textual data at all? As about encodings, please read http://www.joelonsoftware.com/articles/Unicode.html before writting anyother line of code, for humanity's sake. But encodings do not look like your problem here - this is likely binary data. – jsbueno Feb 03 '16 at 02:01
  • Probably what you want is `sys.stdout.buffer.write(myData)` – Andrea Corbellini Feb 03 '16 at 10:17
  • Well, I think the arduino is sending ASCII, because someone wrote it somewhere in a tutorial, but this does not work anyway. Would I know, which is the correct encoding (I thought I could read anything with utf-8, but I learned ;9 ), I think my problem would be solved. I also don't think that it sends a textual data, I just want it to be decoded into something readable. – kire Feb 03 '16 at 17:42
  • 2
    No _way_ is that ASCII. Look at all the bytes that start with `8` through `f`... _none_ of those bytes are legal ASCII. The `\x02` is pretty suspicious, too. (When's the last time _you_ used the Start-Of-Text control character?) Also, there's a Backspace hiding in there (`\x10`). – Kevin J. Chase Feb 04 '16 at 04:03
  • Ok, thanks for that hint. Well I never used/read/write any byte character, because I'm not a programmer but physicist and fairly new to it. All I need to do and do is to read out measurement devices and controll them. And most of the time, the documentation tells me, what encoding to use. – kire Feb 04 '16 at 10:14
  • 1
    Most of your frustration is caused by conflating **bytes** and **characters**. Decades ago, they were similar enough that teachers could pretend they were synonyms, but characters are unmistakably different creatures these days. The best demonstration of the differences, and how to handle them in Python, is [Ned Batchelder](http://pyvideo.org/speaker/140/ned-batchelder)'s 36-minute lecture from PyCon 2012, "[Pragmatic Unicode, or, How Do I Stop the Pain?](http://pyvideo.org/video/948/pragmatic-unicode-or-how-do-i-stop-the-pain)" ([on You Tube](https://www.youtube.com/watch?v=sgHbC6udIqc)). – Kevin J. Chase Feb 04 '16 at 18:04
  • Relevant highlights from from Ned Batchelder's "5 facts of life" in his "Pragmatic Unicode" talk: "Fact of Life #4: **Encoding is out-of-band.** ... You cannot infer the encoding of bytes. You must be told." and "Fact of Life #5: **Data is dirty.** Sometimes you are told wrong. ... That part just sucks.". It's no coincidence that my first comment said almost exactly the same thing. – Kevin J. Chase Feb 04 '16 at 18:12

1 Answers1

1

As @jsbueno said in the comments this is not a decoding problem, it is probably because the byte data being received is actually binary data. I had a very similar problem when reading binary data (bytes) from a file.

There are 2 options to use here, the first one being the struct module:

import struct
a = open("somedata.img", "rb")
b = a.read(2)  
file_size, = struct.unpack("i",a.read(4))

writing the code this way produces a tuple, so to get an integer, just use struct.unpack('i', a.read(4))[0]

Another way which I used if you want to store the data in a numpy array is:

import numpy as np

f = open("somefile.img", "r")
a = np.fromfile(f, dtype=np.uint32)
DavidG
  • 24,279
  • 14
  • 89
  • 82
  • Thanks for the explanation about binary data, but the first code is not working for me. When I put the received data to a file, the `read()` function is telling me, 'bytes' has no attribute read and it is not doing anything. Did you face that problem before? The second approach seems to convert it properly. – kire Feb 03 '16 at 17:51