0

I want to get a variable containing a byte sequence of several fields (they will be later be transmitted via socket).

The byte sequence will include the following three fields:

  • Character SOH (ANSI code 0x01)
  • 32bits integer
  • Unicode string 'Straße'

I have tried:

# -*- coding: UTF-8 -*-

message = b''

soh = u'\0001'
a = 1143
c = u'Straße'

message = message + soh + a + c

print(type(message))

But I get:

TypeError: can't concat str to bytes

I am also not sure that soh = u'\0001' is the right way to define the SOH character.

I am using Python 3.7

M.E.
  • 4,955
  • 4
  • 49
  • 128

3 Answers3

1

Binary data for transfer over a socket connection is best combined using the struct module.

The struct module provides a pack function to create the data structure. You need to provide a format string that describes the data being packed. It's worth studying the format string documentation to ensure that the data is unpacked as expected on the receiving side.

>>> soh = b'\x01'
>>> a = 1143
>>> c = u'Straße'

>>> import struct
>>> pattern = 'ci7s' # 1 byte, 1 int, 1 bytestring of length 7
>>> packed = struct.pack(pattern, soh, a, c.encode('utf-8'))
>>> packed
b'\x01\x00\x00\x00w\x04\x00\x00Stra\xc3\x9fe'

The module provides an unpack function to reverse the packing:

>>> soh_, a_, c_ = struct.unpack(pattern, packed)
>>> soh_
b'\x01'
>>> a
1143
>>> a_
1143
>>> c_.decode('utf-8')
'Straße'
snakecharmerb
  • 47,570
  • 11
  • 100
  • 153
  • It is useful and interesting. I think it is relevant that data to be transfered will be variable in number of fields, length of the strings etc. (in my question I just narrowed it down to a concrete example). So I was thinking in preceding the length of the unicode strings with the length so the receiver can decode the message. Because of this, there will be no specific pattern to be shared between client and server. I wonder if in that scenario struct.pack is as useful as it is when the field structure is clear. Another question I got is if integer is 32 or 64 bits while using struct.pack. – M.E. Aug 07 '19 at 11:35
  • The sizes of each type are given in the format string docs - an int is 4 bytes in size, unless native (platform-dependent) sizing is specified. If your messages are always of the form byte:int:bytes then it would be possible to compute the pattern based on the size of the message. If your messages are more complicated then you'll have to come up with a scheme of your own (or consider third-party solutions like protocol buffers, if the overhead is worth it). – snakecharmerb Aug 07 '19 at 13:24
  • Thanks for clarifying. Due to the custom and non-deterministic nature of the messages I will not use struct.pack/struct.unpack. I did not know about this package though and I think they are useful for these kind of tasks. I finally will give it a try to .to_bytes() and .encode() – M.E. Aug 07 '19 at 13:58
0

Because a is an int so you cannot concatenate it with str. What you should do is try using .encode() on all soh, a and c and then concatenate them to message (.encode makes the type from str to bytes)

(In python 3.x unicode type doesn't exist anymore (it's the same as str) so you have to use either str or bytes)

PMM
  • 366
  • 1
  • 10
  • Could you please elaborate on which arguments shall be used for `.encode()`? Thanks. – M.E. Aug 07 '19 at 11:17
  • integer does not have .encode method, but .to_bytes can be used in Python 3.7 – M.E. Aug 07 '19 at 12:07
  • You have to do something like this. a=a.encode() (Remember to convert a to string before using encode) – PMM Aug 07 '19 at 12:15
  • as mentioned, a is integer and can not be "encoded". And converting that into a string is not what it was intended. It is clearly specified in the question that integer needs to be included in the byte stream as a 32 bits byte integer representation. See the attached answer for a complete answer including .encode('utf-8') and .to_bytes for the integer. – M.E. Aug 07 '19 at 13:55
0

Just in case it is helpful for anyone else, I finally did this:

message = soh.encode('utf-8') + a.to_bytes(4, 'big') + c.encode('utf-8')

struct.pack is really interesting solution but I did not manage to force the integer to be 32 bits and in my particular format the field structure is not known in advance (hence a mechanism to share it between client and server would be needed anyway).

I therefore mixed .to_bytes with .encode for unicode strings.

M.E.
  • 4,955
  • 4
  • 49
  • 128