Questions tagged [apache-samza]

Apache Samza is a distributed stream processing framework.

Apache Samza is a distributed stream processing framework.

It uses Apache Kafka for messaging, and Apache Hadoop YARN to provide fault tolerance, processor isolation, security, and resource management.

It has support for stateful stream processing natively.

Apache Samza is a top level project of the Apache Software Foundation.

82 questions
2
votes
1 answer

Does Samza create partitions automatically when sending messages?

If you use Samza's OutgoingMessageEnvelope to send a message using this format: public OutgoingMessageEnvelope(SystemStream systemStream, java.lang.Object partitionKey, java.lang.Object…
John
  • 10,837
  • 17
  • 78
  • 141
2
votes
1 answer

How can you create a partition on a Kafka topic using Samza?

I have a few Samza jobs running all reading messages off of a Kafka topic and writing a new message to a new topic. To send the new messages, I am using Samza's built in OutgoingMessageEnvelope. Also using a MessageCollector to send out the new…
Ryan Wilson
  • 1,743
  • 3
  • 15
  • 26
2
votes
1 answer

Scala error : unbound placeholder parameter and pattern matching condition

I'm trying to combine pattern matching and condition, but this code (that's a Samza task): override def process(incomingMessageEnvelope: IncomingMessageEnvelope, messageCollector: MessageCollector, taskCoordinator: TaskCoordinator): Unit = { val…
rucka
  • 61
  • 1
  • 9
2
votes
0 answers

Apache Samza's CheckpointTool won't give away partition offsets

I am trying to rewind input feed for one of my samza jobs with checkpoint tool as described here and here. For some reason the checkpoint tool won't output offsets as promised, however I know for a fact that the job has already consumed more than a…
tutturu
  • 21
  • 1
2
votes
1 answer

Apache Samza does not run

I am trying to set up a Apache Samza and Kafka environment. I am experiencing some problems when trying to run the modules. I have Kafka working correctly but I can not make Samza work. I have installed two Debian Jeesy AMD64 boxes and followed the…
jordi
  • 1,157
  • 1
  • 13
  • 37
1
vote
0 answers

Samza task is taking more space even though no process is being performed

I am having microservices run in Samza containers connected using Kafka messaging streams. In a few tasks, memory being used is constantly increasing even though no process is being performed. Not sure why is this happening and sometimes the…
1
vote
1 answer

Exception on samza KafkaSystemFactory.getAdmin

I am running Samza to consume messages off of a given Kafka topic in Scala. In order to run, I created a samza-read.properties file which…
1
vote
0 answers

How to implement a message queue system using Samza and Kafka?

I have to build a queing system that takes in events, and redirects them to multiple consumers. I havent understood how I can implement such a queing system that distributes events to consumers using Samza and Kafka. Please provide resources
AnonymousJoe
  • 31
  • 1
  • 5
1
vote
0 answers

Samza 1.1.0 - run-app.sh does not work during deployment of hello samza

I am facing errors when I deploy the hello samza tutorial on yarn following the documentation. Particularly, I was getting errors when I run the run-app.sh script as mentioned. I am currently using Samza 1.1.0 on AWS EMR (emr - 5.13.0, amazon 2.8.3,…
Harsha
  • 11
  • 3
1
vote
2 answers

Samza: Delay processing of messages until timestamp

I'm processing messages from a Kafka topic with Samza. Some of the messages come with a timestamp in the future and I'd like to postpone the processing until after that timestamp. In the meantime, I'd like to keep processing other incoming messages.…
Björn Marschollek
  • 9,899
  • 9
  • 40
  • 66
1
vote
0 answers

Publishing Kafka message to Elastic search

I have a process that is writing a JSON data object from Kafka and putting some of the fields from this object via the elastic API into elastic search index. I have to write two separate messages - one for the data object and another for the…
1
vote
1 answer

how to go through all element in KeyValueStore

I have a KeyValueStore of type KeyValueStore>. I don't know the range of the keys. Is there any way I can iterate through the whole keyvaluestore in samza? Thanks
helen
  • 73
  • 2
  • 6
1
vote
2 answers

yarn not getting nodes

This is in AWS EMR cluster with 2 task nodes and a Master. I'm trying the hello-samza that launches a yarn job. The job gets stuck in ACCEPTED STATE. I looked in other posts and it seems that my yarn getting no nodes. Any help on what yarn not…
dvshekar
  • 93
  • 11
1
vote
1 answer

Is there a simple consumer task example with samza and Kafka?

I am very new to Kafka and Samza. I tried the hello-samza exampe and it is working. What I am looking for is to create a samza task that reads the message from a kafka topic.The task I added does not throw any error, and is not reading any message…
jijua
  • 11
  • 1
1
vote
2 answers

How to read file in Apache Samza from local file system and hdfs system

Looking for approach in Apache Samza to read file from local system or HDFS then apply filters, aggregate, where condition, order by, group by into batch of data. Please provide some help.