2

Now i'm working with Apache Kafka and have task: We have some csv-files in directory, it's a mini-batch files, each file is about 25-30 mb. All i need - parse file and put it to kafka.

As I can see, Kafka have some interesting thing like Connector.

I can create Source-Connector and SourceTask, but i don't understand one thing: when i handle file, how i can stop or delete my task?

For example i have dummy connector:

public class DummySourceConnector extends SourceConnector {
private static final Logger logger = LogManager.getLogger();

@Override
public String version() {
    logger.info("version");

    return "1";
}

@Override
public ConfigDef config() {
    logger.info("config");

    return null;
}

@Override
public Class<? extends Task> taskClass() {
    return DummySourceTask.class;
}

@Override
public void start(Map<String, String> props) {
    logger.info("start {}", props);
}

@Override
public void stop() {
    logger.info("stop");
}

@Override
public List<Map<String, String>> taskConfigs(int maxTasks) {
    logger.info("taskConfigs {}", maxTasks);

    return ImmutableList.of(ImmutableMap.of("key", "value"));
}

And Task:

public class DummySourceTask extends SourceTask {
private static final Logger logger = LogManager.getLogger();

private long offset = 0;

@Override
public String version() {
    logger.info("version");

    return "1";
}

@Override
public void start(Map<String, String> props) {
    logger.info("start {}", props);
}


@Override
public List<SourceRecord> poll() throws InterruptedException {
    Thread.sleep(3000);

    final String value = "Offset " + offset++ + " Timestamp " + Instant.now().toString();

    logger.info("poll value {}", value);

    return ImmutableList.of(new SourceRecord(
            ImmutableMap.of("partition", 0),
            ImmutableMap.of("offset", offset),
            "topic-dummy",
            SchemaBuilder.STRING_SCHEMA,
            value
    ));
}

public void stop() {
    logger.info("stop");
}

But how i can close my task when it's all done? Or maybe you can help me with another idea for this task.

Thanx for your help!

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
aarexer
  • 523
  • 6
  • 20

2 Answers2

3

First, I encourage you to have a look at existing connectors here. I feel like the spooldir connector would be helpful to you. It may even be possible for you to just download and install it without having to write any code at all.

Second, if I'm understanding correctly, you want to stop a task. I believe this discussion is what you want.

dawsaw
  • 2,283
  • 13
  • 10
  • Hello! Thanx for your help! It's not exactly what i want, but spooldir connector is interesting. No, i want stop my task when i want, cause let's imagine situation - my task read file line by line and when we at the end of the file - we can't stop task. `stop` method calls only by connector(when it rebalancing for example). – aarexer Oct 06 '16 at 18:32
  • Ah yes you want to stop a task from within the task itself based on some event. That I'm not too sure about because we would typically not want to tasks to be able to stop or start themselves because by definition task coordination is the connector's job. Maybe you can instead feed the task the next file? – dawsaw Oct 06 '16 at 23:37
  • Yes, i know, that task coordination is the connector's job. May be feeding new file is good decision... Thanx, it's good answer! – aarexer Oct 07 '16 at 08:06
1

A not so elegant solution of terminating a Task when an event happens is to check for the event in the source of the task and call System.exit(1).

Nevertheless the most elegant solution I have found is this:

When the event occurs the Connector Task apply a REST call to the broker in order to stop the Connector that runs the Task.

To do this the Task itself should know the name of the Connector that runs the task which you can find following the steps of this discussion.

So the name of the connector it is in properties argument of Task, there exists a property with "name" key, and whose value is the name of the Connector which executes the Task ( which we want to stop if an event occurs).

Finally, we make a REST call and we get a 204 answer with no content if the task stops.

The code of the call is this:

 try {

  URL url = new URL("url/" + connectorName);
  HttpURLConnection conn = (HttpURLConnection) url.openConnection();
  conn.setRequestMethod("DELETE");
  conn.setRequestProperty("Accept", "application/json");

  if (conn.getResponseCode() != 204) {
    throw new RuntimeException("Failed : HTTP error code : "
        + conn.getResponseCode());
  }

  BufferedReader br = new BufferedReader(new InputStreamReader(
    (conn.getInputStream())));

  String output;
  System.out.println("Task Stopped \n");
  while ((output = br.readLine()) != null) {
    System.out.println(output);
  }

  conn.disconnect();

  } catch (MalformedURLException e) {

  e.printStackTrace();

  } catch (IOException e) {

  e.printStackTrace();

  }

Now all the Connector Tasks stop.

(Of course as it is mentioned previously you have to keep in mind that the logic of each SourceTask and each SinkTask is neverending. They are supposed to never stop if an event occurs but instead to continuously seaching for new entries in the files you provide them. So usually you stop them with a REST call and if you want them to stop when an event occurs you put that REST call in their own code.)

Novemberland
  • 530
  • 3
  • 8
  • 25