I am to trying call c-interface from python using ctype module. Below is the prototype of C function
void UTF_to_Wide_char( const char* source, unsigned short* buffer, int bufferSize)
UTF_to_Wide_char : converts a UTF-* string into a UCS2 string
source (input) : contains a NULL terminated UTF-8 string
buffer (output) : pointer to a buffer that will hold the converted text
bufferSize : indicates the size of the buffer, the system will copy upto this size including the NULL.
Following is my python function:
def to_ucs2(py_unicode_string):
len_str = len(py_unicode_string)
local_str = py_unicode_string.encode('UTF-8')
src = c_wchar_p(local_str)
buff = create_unicode_buffer(len_str * 2 )
# shared_lib is my ctype loaded instance of shared library.
shared_lib.UTF8_to_Widechar(src, buff, sizeof(buff))
return buff.value
Problem : Above code snippet works fine in python compiled with ucs-4 ( --enable-unicode=ucs4 option ) and will behave unexpected with python compiled with UCS-2 ( --enable-unicode=ucs2 ). ( Verified python unicode compilation option by referring to How to find out if Python is compiled with UCS-2 or UCS-4? )
Unfortunately in production environment I am using python compiled with UCS-2. Please comment on following points.
- Although I am sure about issue is from unicode option, I yet to nail down what is happening under the hoods. Need help in coming up with the required justification.
- Is it is possible to overcome this issue, without compiling python with --enable-unicode=ucs4 option?
( I am quite new to unicode encoding stuff. But have a basic know-how. )