0

I've built an Storm-topology which takes in tuples from Apache-Kafka through a kafka-spout, write this data (using another bolt) as String into an .txt-file on my local system and sends an httpPost from my PostBolt afterwards.

Both bolts are connected to the Kafka-Spout.

If I test the topology without the PostBolt, everything works fine. But If I add the bolt to the topology the whole topology gets blocked for some reason.

Did anyone have the same problem or would have an hint for me, what causes this?

I've read that there were some issues of CloseableHttpClient or CloseableHttpResponse blocking threads from working ... might that be the same issue in this case?


Code of my PostBolt:

public class PostBolt extends BaseRichBolt {

private CloseableHttpClient httpclient; 

@Override
public final void prepare(Map stormConf, TopologyContext context, OutputCollector collector) {
    //empty for now
}

@Override
public final void execute(Tuple tuple) {

    //create HttpClient:
    httpclient = HttpClients.createDefault();
    String url = "http://xxx.xxx.xx.xxx:8080/HTTPServlet/httpservlet";
    HttpPost post = new HttpPost(url);

    post.setHeader("str1", "TEST TEST TEST");

    try {
        CloseableHttpResponse postResponse;
        postResponse = httpclient.execute(post);
        System.out.println(postResponse.getStatusLine());
        System.out.println("=====sending POST=====");
        HttpEntity postEntity = postResponse.getEntity();
        //do something useful with the response body
        //and ensure that it is fully consumed
        EntityUtils.consume(postEntity);
        postResponse.close();
    }catch (Exception e){
         e.printStackTrace();
    }
}

@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
    declarer.declare(new Fields("HttpPost"));
}}

Code of my Topology:

public static void main(String[] args) throws Exception {

    /**
    *   create a config for Kafka-Spout (and Kafka-Bolt)
    */
    Config config = new Config();
    config.setDebug(true);
    config.put(Config.TOPOLOGY_MAX_SPOUT_PENDING, 1);
    //setup zookeeper connection
    String zkConnString = "localhost:2181";
    //define Kafka topic for the spout
    String topic = "mytopic";
    //assign the zookeeper connection to brokerhosts
    BrokerHosts hosts = new ZkHosts(zkConnString);

    //setting up spout properties
    SpoutConfig kafkaSpoutConfig = new SpoutConfig(hosts, topic, "/" +topic, UUID.randomUUID().toString());
    kafkaSpoutConfig.bufferSizeBytes = 1024 * 1024 * 4;
    kafkaSpoutConfig.fetchSizeBytes = 1024 * 1024 * 4;
    kafkaSpoutConfig.scheme = new SchemeAsMultiScheme(new StringScheme());

    /**
    *   Build the Topology by linking the spout and bolts together
    */
    TopologyBuilder builder = new TopologyBuilder();
    builder.setSpout("kafka-spout", new KafkaSpout(kafkaSpoutConfig));
    builder.setBolt("printer-bolt", new PrinterBolt()).shuffleGrouping("kafka-spout");
    builder.setBolt("post-bolt", new PostBolt()).shuffleGrouping("kafka-spout");

    /**
    *   Check if we're running locally or on a real cluster
    */
    if (args != null && args.length >0) {
        config.setNumWorkers(6);
        config.setNumAckers(6);
        config.setMaxSpoutPending(100);
        config.setMessageTimeoutSecs(20);
        StormSubmitter.submitTopology("StormKafkaTopology", config, builder.createTopology());
    } else {
        config.setMaxTaskParallelism(3);
        config.setNumWorkers(6);
        LocalCluster cluster = new LocalCluster();
        cluster.submitTopology("StormKafkaTopology", config, builder.createTopology());
        //Utils.sleep(100000);
        //cluster.killTopology("StormKafkaTopology");
        //cluster.shutdown();
    }
}}

2 Answers2

1

Seems to me you already answered your question but yea... according to this answer you should be using PoolingHttpClientConnectionManager because you'll be running in a multi-threaded environment.

Edit:

public class PostBolt extends BaseRichBolt {
    private static Logger LOG = LoggerFactory.getLogger(PostBolt.class);
    private CloseableHttpClient httpclient;
    private OutputCollector _collector;        

    @Override
    public final void prepare(Map stormConf, TopologyContext context, OutputCollector collector) {
        httpclient = HttpClients.createDefault();
        _collector = collector;
    }

    @Override
    public final void execute(Tuple tuple) {
        String url = "http://xxx.xxx.xx.xxx:8080/HTTPServlet/httpservlet";
        HttpPost post = new HttpPost(url);
        post.setHeader("str1", "TEST TEST TEST");

        CloseableHttpResponse postResponse = httpclient.execute(post);
        try {
            LOG.info(postResponse.getStatusLine());
            LOG.info("=====sending POST=====");
            HttpEntity postEntity = postResponse.getEntity();
            //do something useful with the response body
            //and ensure that it is fully consumed
            EntityUtils.consume(postEntity);
            postResponse.close();
        }catch (Exception e){
           LOG.error("SolrIndexerBolt prepare error", e);
           _collector.reportError(e);
        } finally {
           postResponse.close()
        }

    }

    @Override
    public void declareOutputFields(OutputFieldsDeclarer declarer) {
        declarer.declare(new Fields("HttpPost"));
    }

}
Community
  • 1
  • 1
Kit Menke
  • 7,046
  • 1
  • 32
  • 54
  • Thank you for your answer. Sadly I've already tried to use the PoolingHttpClientConnectionManager but something is still blocking the topology if i add the PostBolt. Also the OP of the Post you linked to me solved his problem in another way: ' I found the cause of blocking. Because NOCONTENT response from server took a body, the connection couldn't be released. It's a big mistake for testing because I just have a environment to test NOCONTENT response. – tank1920 Jul 24 '14 at 16:24 ' – Tobias Gent Dec 09 '16 at 08:27
  • Ok seems like it should work. I added a code sample for you to try with logging and closing the postResponse. – Kit Menke Dec 09 '16 at 15:47
  • Thank you a lot! Your example helped me to minimize the lag in my topology. But there is still an significant lag if i use both, the PostBolt and the PrinterBolt... – Tobias Gent Dec 12 '16 at 10:13
0

Alright, I identified the issue according to this comment https://stackoverflow.com/a/32080845/7208987

The Kafka Spout will continue resending tuples, which were not acked by the "endpoints" they were sent to.

So I just needed to ack the incoming tuples inside the bolts and the hickup of the topology was gone.

(I identified the problem, because the printerbolt did continue writing, even there were no further input from the kafkaspout).

Community
  • 1
  • 1