1

I am trying to use logstash to receive events from TCP socket, and output them to a Kafka topic. My current configuration is able to do that perfectly, but I want to be able to conduct events to Kafka in a transactional manner. I mean, the system should not send the events to kafka, until commit message is received:

START TXN 123         --No message sent to Kafka
123 - Event1 Message  --No message sent to Kafka
123 - Event2 Message  --No message sent to Kafka
123 - Event3 Message  --No message sent to Kafka
COMMIT TXN 123           --Event1, Event2, Event3 messages sent to Kafka

Is there any possibility to achieve this using logstash only or should I introduce any other transaction coordinator in between source and logstash? Here is my current config:

input {
  tcp {
    port => 9000
  }
}

output {
  kafka { 
    bootstrap_servers => "localhost:9092"
    topic_id =>  "alpayk"
  }
}

I tried to use use logstash' s aggregate filter for this purpose, but I couldn' t end up with something works.

Thank you very much in advance

Alpay
  • 1,350
  • 2
  • 24
  • 56
  • While Kafka clients internally can be enabled to do transactional writes, I don't think Logstash has implemented this feature yet. Nor, do I think you can perform this type of "conditional flush" operation within Logstash itself – OneCricketeer Dec 28 '18 at 18:43
  • @cricket_007 thank you for your comment. In fact, i am trying to design this system from scratch, so i won' t necessarily use logstash to carry events from socket to kafka, i can use any other technology in between. My intention is to have a system supporting conditional flush of events as you indicated. – Alpay Dec 29 '18 at 08:32
  • Then you'll probably need to write that producer yourself and manually put in a conditional statement based on your event data – OneCricketeer Dec 30 '18 at 05:27
  • The [aggregate filter](https://www.elastic.co/guide/en/logstash/current/plugins-filters-aggregate.html) might fit your needs. This [answer](https://stackoverflow.com/questions/37353365/calculating-time-between-events/37359000#37359000) might be a good start. I have never used this filter, so I can't write a full answer. – baudsp Jan 02 '19 at 09:28

1 Answers1

0

I finally decided to use Apache Flume for this purpose. I modified its netcat source so that uncommited messages reside in flume' s heap, and as soon as a commit message is received for a transaction, all messages are sent to kafka sink.

I am going to change the message storing location from flume heap to an external cache, so that I will be able to expire the stored messages if transaction abends or rolls back.

Below is my piece of code for that transaction logic:

String eventMessage = new String(body);
int indexOfTrxIdSeparator = eventMessage.indexOf("-");
if (indexOfTrxIdSeparator != -1) {
    String txnId = eventMessage.substring(0, indexOfTrxIdSeparator).trim();
    String message = eventMessage.substring(indexOfTrxIdSeparator + 1).trim();
    ArrayList<Event> events = cachedEvents.get(txnId);

    if (message.equals("COMMIT")) {

        System.out.println("@@@@@ COMMIT RECEIVED");

        if (events != null) {
            for (Event eventItem : events) {
                ChannelException ex = null;
                try {
                    source.getChannelProcessor().processEvent(eventItem);
                } catch (ChannelException chEx) {
                    ex = chEx;
                }

                if (ex == null) {
                    counterGroup.incrementAndGet("events.processed");
                } else {
                    counterGroup.incrementAndGet("events.failed");
                    logger.warn("Error processing event. Exception follows.", ex);
                }
            }

            cachedEvents.remove(txnId);
        }
    } else {
        System.out.println("@@@@@ MESSAGE RECEIVED: " + message);
        if (events == null) {
            events = new ArrayList<Event>();
        }
        events.add(EventBuilder.withBody(message.getBytes()));
        cachedEvents.put(txnId, events);
    }
}

I added this code into processEvents method of the netcat source of Flume. I didn' t want to work with Ruby code, that' s why I decided to switch to Flume. However the same thing could also be done in logstash.

Thank you

Alpay
  • 1,350
  • 2
  • 24
  • 56