Questions tagged [kafka-python]

Kafka-Python provides low-level protocol support for Apache Kafka as well as high-level consumer and producer classes. Request batching is supported by the protocol as well as broker-aware request routing. Gzip and Snappy compression is also supported for message sets.

kafka-python provides low-level protocol support for Apache Kafka as well as high-level consumer and producer classes. Request batching is supported by the protocol as well as broker-aware request routing. Gzip and Snappy compression is also supported for message sets.

For more details about Python Kafka Client API, please refer https://kafka-python.readthedocs.io/en/latest/

443 questions
0
votes
1 answer

No Brokers available when trying to connect to Kafka through Cloudera Data Science Workbench

I am trying to implement the GitHub project (https://github.com/tomatoTomahto/CDH-Sensor-Analytics) on our internal Hadoop cluster via Cloudera Data Science Workbench. On running the project on Cloudera Data Science Workbench, I get the error "No…
Sameer
  • 101
  • 2
  • 11
0
votes
2 answers

How to configure kafka such that we have an option to read from the earliest, latest and also from any given offset?

I know about configuring kafka to read from earliest or latest message. How do we include an additional option in case I need to read from a previous offset? The reason I need to do this is that the earlier messages which were read need to be…
0
votes
2 answers

Getting messages sent from python KafkaProducer

My goal is to get data from non-file sources (i.e. generated within a program or sent though an API) and have it sent to a spark stream. To accomplish this, I'm sending the data through a python-based KafkaProducer: $ bin/zookeeper-server-start.sh…
user2361174
  • 1,872
  • 4
  • 33
  • 51
0
votes
1 answer

kafka-python ssl support for python < v2.7.9 (no attribute 'SSLContext')

When trying to connect with ssl to kafka using kafka-python Im getting the following error: Traceback (most recent call last): File "server.py", line 23, in kafka_producer = SimpleKafkaProducer() File…
Urban48
  • 1,398
  • 1
  • 13
  • 26
0
votes
1 answer

Kafka Producer stops my code

I'm calling a function which sends some data from kafka producer, but after it sends I'm returning a response which doesn't return. The code gets stuck at return. Anyone any idea whats happening? My code is as follows, def postEvent(eventData): …
Gaurav Ram
  • 1,085
  • 3
  • 16
  • 32
0
votes
1 answer

kafka python - Bluemix MessageHub - ConnectionError: socket disconnected

I'm using the kafka python client to push messages to Message Hub, but noticed that after a while of running my app that it would stop sending messages to Message Hub. I then noticed the following in my log files: ConnectionError: socket…
Chris Snow
  • 23,813
  • 35
  • 144
  • 309
0
votes
1 answer

Can kafka producers supply data at quota rate in presence of replicas?

I have a kafka producer belonging to client with clientid - "p1" and quota as 50 MBps. Now I tested the performance of my producer using "bin/kafka-producer-perf-test.sh" and I was able to get throughput close to 50 MBps when writing to a partition…
brokendreams
  • 827
  • 2
  • 10
  • 29
0
votes
1 answer

Could not make an Avro Schema object from date

I have an Avro schema with this property: {"name": "whenDate", "type": ["date", "null"]} I am using a Python client and the producer confluent_kafka.avro.AvroProducer When I load the Avro schema with aforementioned property, I trigger this…
Kode Charlie
  • 1,297
  • 16
  • 32
0
votes
2 answers

Connecting Kafka running on EC2 machine from my local machine

I am new to Kafka and searched different posts in the forums but couldn't find the solution. I have installed kafka on an EC2 instance and trying to connect the same from my ubuntu local machine. My objective is to have python kafka clients(both…
0
votes
2 answers

Getting OverflowError: timeout value is too large while using kafka-python producer-consumer

Well, I am trying to use Kafka-python package(1.3.2) in python to have a simple data tansfer from my producer to consumer. Producer: from kafka import KafkaProducer producer = KafkaProducer(bootstrap_servers='localhost:9092') # produce…
Ranganath Iyengar
  • 55
  • 1
  • 2
  • 12
0
votes
1 answer

Pyspark Kafka offset range units

I am using Spark as batch to process logs that come from kafka. In each cycle my code should get whatever reaches the kafka consumer. However, I want to put a restrition on the amount of data to get from kafka for each cycle. Let's say 5 GB or…
0
votes
1 answer

How does a python kafka producer work when some of brokers are not available?

I have set up a 3 node kafka cluster and used python as producer like this: kafka_addr = "n0.xxx.com:9092,n1.xxx.com:9092,n2.xxx.com:9092" producer = KafkaProducer(bootstrap_servers=kafka_addr) When "n0" and "n1" are available but "n2" is not…
HenryLulu
  • 11
  • 3
0
votes
0 answers

how to get group id when i want to store and get offset outside of kafka in consumer group rebalance callback function

consumer = Consumer({'bootstrap.servers': bootstrap_server_host, 'group.id': group_id, 'enable.auto.commit': auto_commit}) consumer.subscribe([topic], on_assign=on_assign_callback,…
0
votes
1 answer

How to delete specific number of lines from kafka topic by using python or using any inbuilt method?

I am facing a problem while using consumer.poll() method .After fetching data by using poll() method consumer won't have any data to commit so Please help me to remove specific number of lines from the kafka topic .
surya
  • 21
  • 6
0
votes
1 answer

How efficient are Kafka EARLIEST and Kafka LATEST offset resets?

Problem I am thinking about implementing binary search to find a starting offset for time-based event replaying. In order to do so I was thinking about using EARLIEST to find the beginning offset and LATEST to find the latest offset. After that I…