I am trying to continuously consume events from kafka. The same application also uses this consumed data, to perform some analysis and update a database in n-second intervals (assume n = 60 seconds).
In the same application, if process1 = Kafka Consumer , process2= Data Analysis and database update logic.
process1 is to be run continuously
process2 is to be executed once every n=60 seconds
process2
is concerned with computation and database update and hence will take 5-10 seconds to execute. I do not want process1
to stall during the time process2
is executing. Hence, I am using the multiprocessing module
(process1,process2
would be thread1,thread2
if I was using the Threading
module in python but due to what I have read about GIL and the Threading
module not being able to leverage multi-core architecture, I decided to go with the multiprocessing
module.) to achieve concurrency in this case. (If my understanding of GIL
or Threading
module limitations mentioned above is incorrect, my apologies and please feel free to correct me).
The application that I have has a fairly simple interaction between the two processes wherein process1
just fills the queue with all messages it receives in 60 seconds and at the end of 60 seconds , just transfers all messages to process2
.
I am having trouble with this transfer logic. How do I transfer contents of the Queue from process1
to process2
(I guess that would be the main process or another process? That is another question I have, should I instantiate 2 processes in addition to the main process?) at the end of 60 seconds and subsequently clear the queue contents so it starts again on another iteration.
So far I have the following:
import sys
from kafka.client import KafkaClient
from kafka import SimpleConsumer
import time
from multiprocessing import Process,Queue
def kafka_init():
client=KafkaClient('kafka1.wpit.nile.works')
consumer=SimpleConsumer(client, "druidkafkaconsumer", "personalization.targeting.clickstream.prod")
return consumer
def consumeMessages(q):
print "thread started"
while not q.empty():
try:
print q.get(True,1)
Queue.Empty:
break
print "thread ended"
if __name__=="__main__":
starttime=time.time()
timeout=starttime+ 10 #timeout of read in seconds
consumer=kafka_init()
q=Queue()
p=Process(target=consumeMessages,args=q)
while(True):
q.put(consumer.get_message())
if time.time()>timeout:
#transfer logic from process1 to main process here.
print "Start time",starttime
print "End time",time.time()
p.start()
p.join()
break
Any help would be much appreciated.