I am running make to compile C libraries in a python project and using python(python 3.3) pexpect for automation part. So the output of make command is read in chunks by pexpect and in one such chunk it throws the following error when the pexpect tries to convert (python 3 bytes) to (python3's str) type . The main problem is this issue is intermittent not occuring frequently.
UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 1998-1999: unexpected end of data
--> Below sample code shows that when data contains multibyte character (i.e. special character or any unicode data). Pexpect fails to decode when it is processing partial data of multibyte character.
#!/usr/bin/python
# -*- coding: utf-8 -*-
from base import pexpect
MAX_READ_CHUNK = 8
def run(cmd):
child = pexpect.spawn(cmd, maxread=MAX_READ_CHUNK)
while True:
i = child.expect([pexpect.EOF,pexpect.TIMEOUT])
if child.before:
print(child.before)
if i == 0: # EOF
break
elif i == 1: # TIMEOUT
continue
child.close()
return child.exitstatus
############## Main ################
data='“HELLO WORLD”'
#i.e. data = b'\xe2\x80\x9cabcd\xe2\x80\x9d'
print("Data in readable form = %s "%data)
print("Data in bytes = %s \n\n"%data.encode('utf-8'))
run("echo %s"%data)
Following Traceback error is coming:
Data in readable form = “HELLO WORLD”
Data in bytes = b'\xe2\x80\x9cHELLO WORLD\xe2\x80\x9d'
_cast_unicode() enc=[utf-8] s=[b'\xe2\x80\x9cHELLO']
_cast_unicode() enc=[utf-8] s=[b' WORLD\xe2\x80']
Traceback (most recent call last):
File "test.py", line 33, in <module>
run("echo %s"%data)
File "test.py", line 11, in run
i = child.expect([pexpect.EOF,pexpect.TIMEOUT])
File "/home/test/Downloads/base/pexpect.py", line 1358, in expect
return self.expect_list(compiled_pattern_list, timeout, searchwindowsize)
File "/home/test/Downloads/base/pexpect.py", line 1372, in expect_list
return self.expect_loop(searcher_re(pattern_list), timeout, searchwindowsize)
File "/home/test/Downloads/base/pexpect.py", line 1425, in expect_loop
c = self.read_nonblocking (self.maxread, timeout)
File "/home/test/Downloads/base/pexpect.py", line 1631, in read_nonblocking
return super(spawn, self).read_nonblocking(size=size, timeout=timeout)\
File "/home/test/Downloads/base/pexpect.py", line 868, in read_nonblocking
s2 = self._cast_buffer_type(s)
File "/home/test/Downloads/base/pexpect.py", line 1614, in _cast_buffer_type
return _cast_unicode(s, self.encoding)
File "/home/test/Downloads/base/pexpect.py", line 156, in _cast_unicode
return s.decode(enc)
UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 6-7:
unexpected end of data
When MAX_READ_CHUNK value is changed to 9 in above code, it is working fine.
# Output When "MAX_READ_CHUNK = 9"
Data in readable form = “HELLO WORLD”
Data in bytes = b'\xe2\x80\x9cHELLO WORLD\xe2\x80\x9d'
_cast_unicode() enc=[utf-8] s=[b'\xe2\x80\x9cHELLO ']
_cast_unicode() enc=[utf-8] s=[b'WORLD\xe2\x80\x9d\r']
_cast_unicode() enc=[utf-8] s=[b'\n']
“HELLO WORLD”
How to handle this "UnicodeDecodeError: 'utf-8' codec can't decode bytes in position: unexpected end of data" in pexpect during make.