0

Using below configuration I am able to connect samza to kafka-broker

systems.kafka.samza.factory=org.apache.samza.system.kafka.KafkaSystemFactory
systems.kafka.samza.msg.serde=json
systems.kafka.consumer.zookeeper.connect=localhost:2181/
systems.kafka.producer.bootstrap.servers=localhost:9092

But I'm have some doubts regarding SystemFactory class. How to write our own systemfactory class? and what is the purpose of SystemFactoryClass? please give me some idea

sKhan
  • 9,694
  • 16
  • 55
  • 53

2 Answers2

3

You can write your own system factory class by extending the SystemFactory interface and implementing its three abstract functions, getConsumer, getProducer, and getAdmin. In each one of the functions, take getConsumer as an example, you want to create a system customer, an instance of another customized class extending SystemConsumer and defining how the system should consume. By doing so, your Samza job would know how to get the admin/consumer/producer of the system when needed.

Example (in Scala):

class YourSystemFactory extends SystemFactory {
  override def getConsumer(systemName: String, config: Config, registry: MetricsRegistry): SystemConsumer = {
    new YourSystemConsumer(
      getAdmin(systemName, config).asInstanceOf[YourSystemAdmin],
      config.get("someParam"))
  }

  override def getAdmin(systemName: String, config: Config): SystemAdmin = {
    new YourSystemAdmin(
      config.get("someParam"),
      config.get("someOtherParam"))
    )
  }

  override def getProducer(systemName: String, config: Config, registry: MetricsRegistry): SystemProducer = {
    new YourSystemProducer(
      getAdmin(systemName, config).asInstanceOf[YourSystemAdmin],
      config.get("someParam"))
  }
}

In your config:

# Your system params
systems.your.samza.factory=your.package.YourSystemFactory
systems.your.consumer.param=value
systems.your.producer.param=value
jaciefan
  • 451
  • 5
  • 8
0

You don't need implement your KafkaSystemFactory. You have just implement StreamTask

Example :

public class MyTaskClass implements StreamTask {

  public void process(IncomingMessageEnvelope envelope, MessageCollector collector, TaskCoordinator coordinator) {
  // process message
  }
}

Config :

# This is the class above, which Samza will instantiate when the job is run
task.class=com.example.samza.MyTaskClass

# Define a system called "kafka" (you can give it any name, and you can define
# multiple systems if you want to process messages from different sources)
systems.kafka.samza.factory=org.apache.samza.system.kafka.KafkaSystemFactory

# The job consumes a topic called "PageViewEvent" from the "kafka" system
task.inputs=kafka.PageViewEvent

# Define a serializer/deserializer called "json" which parses JSON messages
serializers.registry.json.class=org.apache.samza.serializers.JsonSerdeFactory

# Use the "json" serializer for messages in the "PageViewEvent" topic
systems.kafka.streams.PageViewEvent.samza.msg.serde=json

For more info : Documentation

MaximeF
  • 4,913
  • 4
  • 37
  • 51