1

We are receiving data in different encoding format, currently we are using below mentioned java encodings https://docs.oracle.com/javase/1.5.0/docs/guide/intl/encoding.doc.html

we are moving to python so changing this encoding logic into python. As python is not supporting encoding for Chinese character which is equivalent to java encoding Cp935 we are using javabridge code as below

 `
class String:
    new_fn = javabridge.make_new("java/lang/String", "([BLjava/lang/String;)V")
    def __init__(self, i, s):
        self.new_fn(i, s)
    toString = javabridge.make_method("toString", "()Ljava/lang/String;", "Retrieve the string value")    

array = numpy.array(list(fielddata) , numpy.uint16)
                            strobject = String(array,encoding)
                            convertedstr = strobject.toString()  `

however we are getting the error


'utf-8' codec can't decode byte 0xc0 in position 0: invalid start byte


looking for the help or alternative way of doing this in python.

rsarda
  • 49
  • 8

1 Answers1

0
class JavaEncoder:
        # creating new method for java bridge
        new_fn = javabridge.make_new("java/lang/String", "([BLjava/lang/String;)V")

        def __init__(self, i, s):

            i[i == 0] = 64
            self.new_fn(i, s)

        # creating toString method of JAVA
        toString = javabridge.make_method("toString", "()Ljava/lang/String;", "Retrieve the integer value")

While converting data using JAVABRIDGE if field is having size 1 and data contains 00 then numpy.uint8 convert this into 0 considering this as integer because of which, while converting data, we are getting encoding error to avoid this we added above code 64 is space (40 EBCDIC/20 ASCII space) in uint8.

rsarda
  • 49
  • 8
  • It's great that you found the answer and decided to share it with us. As it is, it is a little hard to follow, please consider editing it to give it some structure to make it even better. – waldrumpus Jun 15 '18 at 07:32
  • @waldrumpus I hope its clear and more structured now – rsarda Jun 18 '18 at 12:01