0

I'm trying to generate keys for every message in Kafka, for that purpose I want to create a key generator that joins the topic first two characters and the tweet id.

Here is an example of the messages that get sent in kafka:

{"data":{"created_at":"2022-03-18T09:51:12.000Z","id":"1504757303811231755","text":"@Danielog111 @POTUS @NATO @UNPeacekeeping @UN Yes! Not to minimize Ukraine at all, but to bring attention to a horrific crisis and Tigrayan genocide that targets 7M people, longer time frame, and is largely unacknowledged by western news agencies. And people are being eaten-literally! @maddow @JoyAnnReid help Ethiopians!"},"matching_rules":[{"id":"1502932028618072070","tag":"NATO"},{"id":"1502932021731115013","tag":"Biden"}]}'

And here is my code modified to try generating partition keys (I'm using PyKafka):

from dotenv import load_dotenv
import os
import json
import tweepy
from pykafka import KafkaClient


# Getting credentials:

BEARER_TOKEN=os.getenv("BEARER_TOKEN")

# Setting up pykafka:

def get_kafka_client():
    return KafkaClient(hosts='localhost:9092,localhost:9093,localhost:9094')

def send_message(data, name_topic, id):    
    client = get_kafka_client()
    topic = client.topics[name_topic]
    producer = topic.get_sync_producer()
    producer.produce(data, partition_key=f"{name_topic[:2]}{id}")

# Creating a Twitter stream listener:

class Listener(tweepy.StreamingClient):

    def on_data(self, data):
        print(data)
        message = json.loads(data)
        for rule in message['matching_rules']:
            send_message(data, rule['tag'], message['data']['id'].encode())
        return True
    
    def on_error(self, status):
        print(status)

# Start streaming:

Listener(BEARER_TOKEN).filter(tweet_fields=['created_at'])

And this is the error I'm getting:

File "/Users/mac/.local/share/virtualenvs/tweepy_step-Ck3DvAWI/lib/python3.9/site-packages/pykafka/producer.py", line 372, in produce
raise TypeError("Producer.produce accepts a bytes object as partition_key, "
TypeError: ("Producer.produce accepts a bytes object as partition_key, but it got '%s'", <class 'str'>)

I've also tried not encoding it and trying to fetch the id just using the data (that comes in bytes) but none of these options work.

Doraemon
  • 315
  • 1
  • 10

1 Answers1

0

I found the error, I should've been encoding the partition key and not the json id:

def send_message(data, name_topic, id):    
    client = get_kafka_client()
    topic = client.topics[name_topic]
    producer = topic.get_sync_producer()
    producer.produce(data, partition_key=f"{name_topic[:2]}{id}".encode())

# Creating a Twitter stream listener:

class Listener(tweepy.StreamingClient):

    def on_data(self, data):
        print(data)
        message = json.loads(data)
        for rule in message['matching_rules']:
            send_message(data, rule['tag'], message['data']['id'])
        return True
    
    def on_error(self, status):
        print(status)
Doraemon
  • 315
  • 1
  • 10