0

I'm testing a script that runs binwalk on a file and then sends a kafka message to let the sending file know that it was completed or if it failed. It looks like this:

if __name__ == "__main__":
    # finds the path of this file
    scriptpath = os.path.dirname(inspect.getfile(inspect.currentframe()))
    print(scriptpath)
    # sets up kafka consumer on the binwalk topic and kafka producer for the bwsignature topic
    consumer = KafkaConsumer('binwalk', bootstrap_servers=['localhost:9092'])
    producer = KafkaProducer(bootstrap_servers = ['localhost:9092'])
    
    # watches the binwalk kafka topic
    for msg in consumer:
        # load the json
        job = json.loads(msg.value)
        # get the filepath of the .bin
        filepath = job["src"]
        print(0)
 
        try:
            # runs the script
            binwalkthedog(filepath, scriptpath)
            # send a receipt
            producer.send('bwsignature', b'accepted')
        except:
            
            producer.send('bwsignature', b'failed')
            pass
 

    producer.close()
    consumer.close()

If I send in a file that doesn't give any errors in the 'binwalkthedog' function then it works fine, but if I give it a file that doesn't exist it prints a general error message and moves on to the next input, as it should. For some reason, the producer.send('bwsignature', b'failed') doesn't send unless there's something that creates a delay after the binwalkthedog call fails like time.sleep(1) or a for loop that counts to a million.
Obviously I could keep that in place but it's really gross and I'm sure there's a better way to do this.
This is the temp script I'm using to send and recieve a signal from the binwalkthedog module:

job = {
    'src' : '/home/nick/Documents/summer-2021-intern-project/BinwalkModule/bo.bin',
    'id' : 1
}
chomp = json.dumps(job).encode('ascii')
receipt = KafkaConsumer('bwsignature', bootstrap_servers=['localhost:9092'])
producer = KafkaProducer(bootstrap_servers = ['localhost:9092'])
future = producer.send('binwalk', chomp)

try:
    record_metadata = future.get(timeout=10)
except KafkaError:
    print("sucks")
    pass
print(record_metadata.topic)
print(record_metadata.partition)
print(record_metadata.offset)
producer.close()

for msg in receipt:
    print(msg.value)
    break

1 Answers1

0

Kafka producers batch many records together to reduce requests made to the server. If you want to force records to send, rather than introducing a blocking sleep call, or calling a get on the future, you should use producer.flush()

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245