0

I am using the pykafka library to post messages on Kafka. My data set is a JSON

{"user": "jpoole", "created_at_unixtime": 1440407147.033846, "id": 3600730356622213650, "text": "Techical support for my new computer as A+, thank you @fudgemart", "created_at": "Mon Aug 24 05:05:47 +0000 2015"} 
]

My requirement is to generate 2 kafka messages, 1 for each JSON string above using PyKafka. I have tried the following so far.

from pykafka import KafkaClient

client = KafkaClient(hosts="127.0.0.1:9092")
topic = client.topics['test']

with open('./tweets.json') as f:
    dataItems =json.load(f)

s=json.dumps(dataItems).encode('utf-8')


with topic.get_sync_producer() as producer:
    for data in s:
        producer.produce(data)

I have the JSON loaded into a file (my original requirement). The above code works but it doesn't take the first JSON string as a whole but instead takes every character in the string as a message.

My requirement is to publish each JSON string as a separate Kafka message.

Message 1
{"user": "jpoole", "created_at_unixtime": 1448221456.6646008, "id": 3731785240073317438, "text": "Glad I bought my electronics from @fudgemart", "created_at": "Sun Nov 22 14:44:16 +0000 2015"}

Message 2
{"user": "jpoole", "created_at_unixtime": 1440407147.033846, "id": 3600730356622213650, "text": "Techical support for my new computer as A+, thank you @fudgemart", "created_at": "Mon Aug 24 05:05:47 +0000 2015"}

Thanks

Nick
  • 157
  • 2
  • 14

1 Answers1

0

Without any logs it's hard to say, but I think the problem might possibly be with the line for data in s:.

JSON.dumps() produces a string, so s is a string, and data is a character. So the problem is that you're producing each character separately.

yaken
  • 559
  • 4
  • 17
  • json.dumps() was added to convert the dict to bytes. The producer.produce() method requires a bytes. If I don't convert this is the error I get "TypeError: ("Producer.produce accepts a bytes object as message, but it got '%s'", )" – Nick May 21 '21 at 18:45
  • From what I understand, JSON.dumps returns a String: https://docs.python.org/3/library/json.html#json.dump. So here, you've tried to send the object directly (which would not work, as the error message shows you). And you've also tried to send individual strings. But from what I understand, you haven't tried sending the actual string: ie use `JSON.dumps`, and return that directly (without going char by char). Correct me if my understanding is wrong. – yaken May 21 '21 at 20:33