0

I have the following string that I am receiving via a python server. Do not have access to that server.

\xa1\x823\xc2\xd5\x823\xc2\xff\x823\xc2\x12\x833\xc2\x1b\x833\xc2\x16\x833\xc2\x1e\x833\xc2 \x833\xc2\x0e\x833\xc2\x03\x833\xc2\x01\x833\xc2\x10\x833\xc2\'\x833\xc2\x17\x833\xc2\x00\x833\xc2\x11\x833\xc2$\x833\xc2$\x833\xc2\x1f\x833\xc2\x02\x833\xc2\xc0\x823\xc2\x94\x823\xc2\x91\x823\xc2\x7f\x823\xc2a\x823\xc2R\x823\xc2N\x823\xc2e\x823\xc2+\x823\xc2\xd3\x813\xc2\xee\x813\xc2\xe9\x813\xc2\xdf\x813\xc2\xfb\x813\xc2(\x823\xc25\x823\xc2\x17\x823\xc2\x1c\x823\xc2;\x823\xc2\xa2\x823\xc2\xe5\x823\xc2\xc2\x823\xc2\xbc\x823\xc2\x9b\x823\xc2\x13\x823\xc2\xbd\x813\xc2\xc0\x813\xc2\xc5\x813\xc2\xf2\x813\xc2(\x823\xc27\x823\xc2;\x823\xc2.\x823\xc2,\x823\xc20\x823\xc2\x11\x823\xc2\x0b\x823\xc2\xdf\x813\xc2\xb0\x813\xc2\xa2\x813\xc2\x7f\x813\xc2v\x813\xc2y\x813\xc2l\x813\xc2m\x813\xc2z\x813\xc2\x8c\x813\xc2\x89\x813\xc2w\x813\xc2Y\x813\xc2Y\x813\xc2c\x813\xc2e\x813\xc2Z\x813\xc2\x10\x813\xc2\xd2\x803\xc2\x8c\x803\xc2G\x803\xc2)\x803\xc2-\x803\xc2\x19\x803\xc2\xef\x7f3\xc2\xc9\x7f3\xc2\xc9\x7f3\xc2\xc8\x7f^C}3\xc2\xe7}3\xc2\xdd}3\xc2\xbc}3\xc2\xa9}3\xc2\xb7}3\xc2\xc1}3\xc2\xb0}3\xc2\x95}3\xc2\x9f}3\xc2\xd8}3\xc2\x05~3\xc2\x12~3\xc2\x15~3\xc2\r~3\xc2\x15~3\xc23~3\xc2/~3\xc2\x1d~3\xc2\x17~3\xc2\x15~3\xc2\x1d~3\xc2\x1e~3\xc2\x1a~3\xc2\x1f~3\xc2E~3\xc2W~3\xc2C~3\xc2o~3\xc2g~3\xc2p~3\xc2\xa3~3\xc2\x9b~3\xc2\x9e~3\xc2\x9e~3\xc2\xce~3\xc2\xe5~3\xc2\xe0~3\xc2\xd2~3\xc2\xc6~3\xc2\xc6~3\xc2\xc1~3\xc2\xca~3\xc2\xd6~3\xc2\xce~3\xc2\xa4~3\xc2\xad~3\xc2\xe1~3\xc2\xf8~3\xc2\xf8~3\xc2\x11\x7f3\xc2;\x7f3\xc2)\x7f3\xc2\xe6~3\xc2\xc4~3\xc2\xcc~3\xc2\xcd~3\xc2\xca~3\xc2\xc4~3\xc2\xbf~3\xc2\xcc~3\xc2\xc8~3\xc2\xc8~3\xc2\xd3~3\xc2\xd5~3\xc2\xa2~3\xc2L~3\xc2\x1c~3\xc2\x11~3\xc2\x14~3\xc2\x0e~3\xc2\x01~3\xc2\xf2}3\xc2\xf8}3\xc2\x05~3\xc2\xe3}3\xc2\xb0}3\xc2\x9c}3\xc2\x9e}3\xc2\x90}3\xc2\xcc}3\xc2\x1b~3\xc2\x05~3\xc2\xfa}3\xc2\x06~3\xc2\xf7}3\xc2\xf6}3\xc2\x15~3\xc2\x1f~3\xc2\x1b~3\xc2#~3\xc23~3\xc2H~3\xc2o~3\xc2\x89~3\xc2\x89~3\xc2\x94~3\xc2\x97~3\xc2\x84~3\xc2m~3\xc2\x8d~3\xc2\xdf~3\xc2\x0e\x7f3\xc2\x10\x7f3\xc27\x7f3\xc2]\x7f3\xc2i\x7f3\xc2e\x7f3\xc2[\x7f3\xc2k\x7f3\xc2x\x7f3\xc2\x89\x7f3\xc2\x9b\x7f3\xc2\xae\x7f3\xc2\xbd\x7f3\xc2\xb2\x7f3\xc2\xa4\x7f3\xc2\xba\x7f3\xc2\xce\x7f3\xc2\xd1\x7f3\xc2\xd0\x7f3\xc2\xc7\x7f3\xc2\xaa\x7f3\xc2m\x7f3\xc25\x7f3\xc2\x1e\x7f3\xc2\x1f\x7f3\xc2\x1b\x7f3\xc2\x1e\x7f3\xc2\r\x7f3\xc2\xed~3\xc2\xe3~3\xc2\xdd~3\xc2\xe6~3\xc2\x15\x7f3\xc2:\x7f3\xc29\x7f3\xc2B\x7f3\xc2N\x7f3\xc21\x7f3\xc2\x11\x7f3\xc2\x13\x7f3\xc2:\x7f3\xc2k\x7f3\xc2v\x7f3\xc2u\x7f3\xc2\x89\x7f3\xc2\x9f\x7f3\xc2\xa7\x7f3\xc2\xbe\x7f3\xc2\xd1\x7f3\xc2\xec\x7f3\xc2\n\x803\xc2\t\x803\xc2\x1f\x803\xc2Y\x803\xc2{\x803\xc2t\x803\xc2p\x803\xc2i\x803\xc2

In reality, this should be floating point number after decoding.

How can I decode it? How to know the encoding of the string? Preferably using python !!

I tried chardet , decode('utf8') and what not !! Any help is appreciated.

After trying this >

c=a.decode('utf-16-be', errors='ignore').encode('ascii') 

Got this >

UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-199: ordinal not in range(128)

after trying this >>>

c=a.decode('utf-16-le').encode('ascii') 

Got this >>>>

File "/usr/lib/python2.7/encodings/utf_16_le.py", line 16, in decode return codecs.utf_16_le_decode(input, errors, True) UnicodeDecodeError: 'utf16' codec can't decode byte 0x33 in position 470: truncated data

Rajib
  • 36
  • 4
  • maybe this helps: http://stackoverflow.com/questions/40624129/python-2-7-convert-utf8-string-to-ascii – derfium May 07 '17 at 07:48
  • Possible duplicate of [Python 2.7, convert utf8 string to ascii](http://stackoverflow.com/questions/40624129/python-2-7-convert-utf8-string-to-ascii) – derfium May 07 '17 at 07:49
  • None of them worked. After trying this > `c=a.decode('utf-16-be', errors='ignore').encode('ascii')` Got this > `UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-199: ordinal not in range(128)` after trying this >>> c=a.decode('utf-16-le').encode('ascii') Got this >>>> File "/usr/lib/python2.7/encodings/utf_16_le.py", line 16, in decode return codecs.utf_16_le_decode(input, errors, True) UnicodeDecodeError: 'utf16' codec can't decode byte 0x33 in position 470: truncated data – Rajib May 07 '17 at 08:09

1 Answers1

0

It looks like this data has been packed using the Python's struct module. I'm not sure what the first two characters in the string represent, they aren't floats but could be chars or short ints. The remainder of the string comprises of floats.

Ignoring the first two characters for now, we get:

struct.unpack('!143f', s2[2:]) # s2 is the example string from the question (9.037722037419371e-08, 9.038631532121144e-08, 9.036266845896535e-08, 9.071190731901879e-08, 9.06009489654025e-08, 9.058094008196349e-08, 9.058094008196349e-08, 9.058457806077058e-08, 9.063005279585923e-08, 9.067370854154433e-08, 9.071736428722943e-08, 9.071190731901879e-08, 9.064278572168405e-08, 9.063005279585923e-08, 9.062277683824504e-08, 9.059003502898122e-08, 9.058821603957767e-08, 9.057912109255994e-08, 9.057366412434931e-08, 9.054456029389257e-08, 9.051727545283939e-08, 9.05099994952252e-08, 9.04863526329791e-08, 9.046270577073301e-08, 9.04463348661011e-08, 9.04663437495401e-08, 9.050090454820747e-08, 9.054274130448903e-08, 9.050090454820747e-08, 9.052273242105002e-08, 9.062277683824504e-08, 9.061368189122732e-08, 9.067916550975497e-08, 9.07519250858968e-08, 9.073009721305425e-08, 9.071918327663298e-08, 9.06882604567727e-08, 9.066097561571951e-08, 9.071372630842234e-08, 9.071190731901879e-08, 9.069735540379043e-08, 9.071918327663298e-08, 9.071918327663298e-08, 9.071918327663298e-08, 9.075738205410744e-08, 9.076283902231808e-08, 9.072645923424716e-08, 9.074101114947553e-08, 9.070281237200106e-08, 9.069189843557979e-08, 9.069371742498333e-08, 9.066825157333369e-08, 9.062277683824504e-08, 9.060276795480604e-08, 9.063732875347341e-08, 9.06409667322805e-08, 9.068644146736915e-08, 9.074646811768616e-08, 9.072645923424716e-08, 9.073009721305425e-08, 9.074464912828262e-08, 9.06882604567727e-08, 9.068644146736915e-08, 9.074283013887907e-08, 9.075010609649325e-08, 9.070463136140461e-08, 9.069189843557979e-08, 9.075920104351098e-08, 9.079012386337126e-08, 9.078830487396772e-08, 9.034265957552634e-08, 9.036630643777244e-08, 9.038813431061499e-08, 9.04245140986859e-08, 9.051910154767029e-08, 9.056457628275894e-08, 9.056639527216248e-08, 9.061550798605822e-08, 9.071737139265679e-08, 9.07337422972887e-08, 9.077557905357025e-08, 9.042452120411326e-08, 9.040087434186717e-08, 9.037540849021752e-08, 9.03626755643927e-08, 9.039541737365653e-08, 9.046453897099127e-08, 9.043907311934163e-08, 9.042634019351681e-08, 9.046453897099127e-08, 9.050455673786928e-08, 9.052456562130828e-08, 9.050637572727283e-08, 9.046090099218418e-08, 9.041542625709553e-08, 9.036449455379625e-08, 9.080286389462344e-08, 9.034448567035724e-08, 9.077376006416671e-08, 9.070827644563906e-08, 9.066825867876105e-08, 9.068826756220005e-08, 9.070100048802487e-08, 9.065916373174332e-08, 9.065552575293623e-08, 9.073919926549934e-08, 9.038268444783171e-08, 9.039359838425298e-08, 9.044453008755227e-08, 9.059004923983593e-08, 9.06518948795565e-08, 9.06682657841884e-08, 9.079195706362952e-08, 9.044089921417253e-08, 9.044089921417253e-08, 9.042089033073353e-08, 9.041361437311934e-08, 9.041725235192644e-08, 9.039360548968034e-08, 9.042816628834771e-08, 9.048091698105054e-08, 9.053146499127251e-08, -75.2074203491211, -71.7074203491211, -61.10371017456055, -55.85371017456055, -58.35371017456055, -87.2074203491211, -102.7074203491211, -103.7074203491211, -107.2074203491211, -118.2074203491211, -114.7074203491211, -111.2074203491211, -105.7074203491211, -74.7074203491211, -58.10371017456055, -55.35371017456055, -58.55054473876953, 1.054752845871448e+18, 1005890699264.0, 6.59220528669655e+16, -7.216911831845169e-31)

Treating the first two characters as chars: struct.unpack('!2c', s2[:2]) ('5', 'g')

As short ints: struct.unpack('!h', s2[:2]) (13671,)

You can unpack the whole string at once by combining the formats:

>>> struct.unpack('!h143f', s2)

The format string consists of three parts:

  • ! indicates that we are using network (big-endian) byte order.
  • h indicates the first 2 bytes are a short (the size of a short int is 2); if the first two bytes where chars (size 1) we would use 2c instead of h.
  • 143f indicates that there follows 143 floats (the size of afloat is 4)

Added together, the sizes equal the length of the input string: 2 + (143 *4) == len(s2) == 574 True

snakecharmerb
  • 47,570
  • 11
  • 100
  • 153
  • Thanks a lot for this. This is very helpful. But The number should be `-44.something` . Also the stream I am getting has different sizes of strings...where all of them should have a value of `-44.something`. The scenario is ...I am using one software to send values via UDP server, when I receive it via that same software I can clearly get the value. But I am trying to see it without that software. – Rajib May 07 '17 at 22:57