1

I am trying to integrate kinesis in spark streaming and for that I am using python and KCL. I get this exception most of the times when reading from kinesis

'utf8' codec can't decode byte 0xf1 in position 940: invalid continuation byte

Can someone please let me know how can I solve this problem This is how I create the stream

kinesisStream = KinesisUtils.createStream(ssc, APPLICATION_NAME, STREAM_NAME, ENDPOINT, REGION_NAME, INITIAL_POS, CHECKPOINT_INTERVAL, awsAccessKeyId =AWSACCESSID, awsSecretKey=AWSSECRETKEY) 
Nipun
  • 4,119
  • 5
  • 47
  • 83
  • Are you sure the data you're sending into Kinesis is UTF-8? Seems like you might be getting some latin-1 (ISO-8859-1) input which could be one cause of this type of error. – devonlazarus Mar 05 '16 at 03:00
  • 1
    Yes you are rite, i was not sending correct utf-8 format. I will mark this question as done. Please make your comment as answer and i will mark that as the answer – Nipun Mar 05 '16 at 03:02

1 Answers1

1

You should check to make sure data coming into the stream is UTF-8.

Trying to decode Latin-1 (ISO-8859-1) as UTF-8 can be one cause of this type of error.

devonlazarus
  • 1,277
  • 10
  • 24