-2

I am trying to read dat from kafka topic and writing it to HDFS filesystem, I buid my project using the apex malhar from [ https://github.com/apache/apex-malhar/tree/master/examples/kafka]. Unfortunally, after setting up the kafka properties and hadoop config the data are not created in my hdfs 2.6.0 system. PS: the console dosen't show any error and everything seems to work fine

here the code I am using for my app

public class TestConsumer {
    public static void main(String[] args) {
        Consumer consumerThread = new Consumer(KafkaProperties.TOPIC);
        consumerThread.start();
        ApplicationTest a = new ApplicationTest();
         try {
            a.testApplication();
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

Here the ApplicationTest class example from apex malhar

package org.apache.apex.examples.kafka.kafka2hdfs;

import org.apache.log4j.Logger;
import javax.validation.ConstraintViolationException;

import org.junit.Rule;


import org.apache.apex.malhar.kafka.AbstractKafkaInputOperator;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.net.NetUtils;

import com.datatorrent.api.LocalMode;

import info.batey.kafka.unit.KafkaUnitRule;




/**
 * Test the DAG declaration in local mode.
 */

public class ApplicationTest
{
  private static final Logger LOG = Logger.getLogger(ApplicationTest.class);
  private static final String TOPIC = "kafka2hdfs";

  private static final int zkPort = NetUtils.getFreeSocketPort();
  private static final int brokerPort = NetUtils.getFreeSocketPort();
  private static final String BROKER = "localhost:" + brokerPort;
  private static final String FILE_NAME = "test";
  private static final String FILE_DIR = "./target/tmp/FromKafka";



  // broker port must match properties.xml
  @Rule
  private static  KafkaUnitRule kafkaUnitRule = new KafkaUnitRule(zkPort, brokerPort);

  public void testApplication() throws Exception
  {
    try {
      // run app asynchronously; terminate after results are checked
      LocalMode.Controller lc = asyncRun();


      lc.shutdown();
    } catch (ConstraintViolationException e) {
        LOG.error("constraint violations: " + e.getConstraintViolations());

    }
  }
  private Configuration getConfig()
  {
    Configuration conf = new Configuration(false);
    String pre = "dt.operator.kafkaIn.prop.";
    conf.setEnum(pre + "initialOffset", AbstractKafkaInputOperator.InitialOffset.EARLIEST);
    conf.setInt(pre + "initialPartitionCount", 1);
    conf.set(pre + "topics", TOPIC);
    conf.set(pre + "clusters", BROKER);

    pre = "dt.operator.fileOut.prop.";
    conf.set(pre + "filePath", FILE_DIR);
    conf.set(pre + "baseName", FILE_NAME);
    conf.setInt(pre + "maxLength", 40);
    conf.setInt(pre + "rotationWindows", 3);

    return conf;
  }

  private LocalMode.Controller asyncRun() throws Exception
  {
    Configuration conf = getConfig();
    LocalMode lma = LocalMode.newInstance();
    lma.prepareDAG(new KafkaApp(), conf);
    LocalMode.Controller lc = lma.getController();
    lc.runAsync();
    return lc;
  }
}
OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
AMLOCO
  • 41
  • 6
  • If I had to guess, you're doing `runAsync`, then immediately calling the shutdown method of the controller – OneCricketeer Jul 15 '18 at 12:33
  • In any case, if you have Kafka, then you can use Kafka Connect to write messages to HDFS (it doesn't require Confluent installation) – OneCricketeer Jul 15 '18 at 12:34
  • Thaks for the reply, but I think the part where it write data into hdfs is defined in the "lma.prepareDAG(new KafkaApp(), conf);" line inside the asyncRun() method. PS: This is the first time I work with it, can you make it more explicit to me ? – AMLOCO Jul 16 '18 at 09:04
  • I do not know Apex. I'm hear for the Kafka tag. See documentation from Confluent https://docs.confluent.io/current/connect/connect-hdfs/docs/index.html and a blog https://engineering.pandora.com/creating-a-data-pipeline-with-the-kafka-connect-api-from-architecture-to-operations-56715080ac55 My point here is that you shouldn't need to write any more code than some config files – OneCricketeer Jul 16 '18 at 11:59
  • Regarding what you're trying, though, this example doesn't look like your code... https://github.com/apache/apex-malhar/tree/master/examples/kafka/src/main/java/org/apache/apex/examples/kafka/kafka2hdfs – OneCricketeer Jul 16 '18 at 12:10
  • ok I will check the confluent documentation. PS: my code is from https://github.com/apache/apex-malhar/blob/master/examples/kafka/src/test/java/org/apache/apex/examples/kafka/kafka2hdfs/ApplicationTest.java – AMLOCO Jul 16 '18 at 18:13

1 Answers1

0

After runAsync and before shutdown, you would need to wait for the expected results (otherwise the DAG will exit immediately). That's actually what happens in the example.

Thomas
  • 348
  • 1
  • 11
  • Unfortunately, this is doesn't solve the problem, I still can't find my data stored in the hdfs filsystem – AMLOCO Jul 20 '18 at 09:01