Questions tagged [apache-samza]

Apache Samza is a distributed stream processing framework.

Apache Samza is a distributed stream processing framework.

It uses Apache Kafka for messaging, and Apache Hadoop YARN to provide fault tolerance, processor isolation, security, and resource management.

It has support for stateful stream processing natively.

Apache Samza is a top level project of the Apache Software Foundation.

82 questions
0
votes
0 answers

Configure number of threads in Apache Samza

Apache Samza's documentation states that it can be run with multiple threads per worker: Threading model and ordering Samza offers a flexible threading model to run each task. When running your applications, you can control the number of workers…
Sören Henning
  • 326
  • 4
  • 16
0
votes
1 answer

How does samza generate the container.id when the application is deployed in yarn?

Can someone let me know how does samza generates the samza.container.id / SAMZA_CONTAINER_ID when the application is deployed in yarn? I looked around in the samza code base but not able to locate the logic for the generation of the…
tuk
  • 5,941
  • 14
  • 79
  • 162
0
votes
1 answer

Apache Samza flush table update to changelog immediately

If I specify a changelog backing for a RocksDB Table in Samza. Is there configuration to update the async write time to the changelog? I want to reduce it to a shorter time. I cannot see anything in the Config reference. The scenario I want is too…
perkss
  • 1,037
  • 1
  • 11
  • 37
0
votes
1 answer

Conflict with runner dependencies in Beam

I want to test different stream processing engines using Beam, but can't run the program when Flink and Samza dependencies are included. If only one of them is included, it works fine for all the other runners. My pom.xml contains the…
0
votes
1 answer

Reset to custom offset in Kafka partition

I am researching Kafka for a specific use case I am working on. I have a stream of data that is flowing and I want to process it and publish it to intermediary stages. At each of these stages (initial and intermediary) Samza tasks would do the…
Shabirmean
  • 2,341
  • 4
  • 21
  • 34
0
votes
1 answer

Force Samza key/value store backed by RocksDB to reload from kafka changelog?

In order to debug a production problem, I am running Samza code locally using ProcessJobFactory. Everything appears to run fine. The code uses a Samza key/value store backed by RocksDB and Kafka as a changelog (Kafka running on a different machine…
drobin
  • 286
  • 2
  • 6
0
votes
1 answer

Samza tutorial compileScala FAILED

Not sure how to fix this problem as I am new to Samza and Scala. I am following the tutorial and currently stuck on this section: https://github.com/apache/samza-hello-samza#2-start-a-grid And this is the error message I get > Task…
Liondancer
  • 15,721
  • 51
  • 149
  • 255
0
votes
1 answer

Do we need to remove duplicate by ourselves on at least once delivery case?

Apache Storm and Samza guarantee at least once delivery. It means that there may be some duplicates in the computation process. Do we need to move the duplicates by ourselves(including removing duplicate part in our code)? For example, the word…
SherleyZ
  • 1
  • 1
0
votes
1 answer

samza container are failing

Hello my samza job containers are failing frequently due to following errors : Exception from container-launch. Container id: container_1540535314451_0141_01_000021 Exit code: 1 Stack trace: ExitCodeException exitCode=1: at…
hitesh
  • 449
  • 3
  • 11
  • 26
0
votes
1 answer

Samza 0.14.1 not correctly handling OffsetOutOfRangeException exception?

We are facing an identical problem as described in this thread. Here - Samza is requesting for an Kafka partition offset that is too old (i.e Kafka log has moved ahead). We are setting the property consumer.auto.offset.reset to smallest and…
tuk
  • 5,941
  • 14
  • 79
  • 162
0
votes
1 answer

Buffer messages in stream data for a given messageId

Use case: i have messages having messageId, multiple messages can have same message id, these messages are present in streaming pipeline (like kafka) partitioned by messageId, so i am making sure all the messages with same messageId will go in same…
0
votes
1 answer

Load data from separate kafka cluster to Samza?

I am trying to create a Samza job that as closely resembles the Wikipedia example job as I can make it. However in the "WikipediaFeed" object I am trying to get data from a different Kafka broker than the Kafka broker that is running when you start…
0
votes
1 answer

Apache Samza: Getting Started with Samza REST and hello-samza

I am following the hello-samza tutorial on the Apache Samza website and want to add a REST service as described here: http://samza.apache.org/learn/tutorials/latest/samza-rest-getting-started.html I can see the samza jobs in YARN UI, but the…
cookiedealer
  • 381
  • 1
  • 6
  • 18
0
votes
1 answer

Samza equivalent of Kafka Consumer - Manual Offset Control (enable.auto.commit = false)

We have Samza tasks which reads messages from Kafka Output stream but if there is any retryable failure while processing the message then i would want my Samza task to read the same message again and reprocess it. And after successfully processing…
sidss
  • 923
  • 1
  • 12
  • 20
0
votes
1 answer

consume remote kafka topic with samza

I am trying to modify the hello-samza tutorial to: (1) Read from a kafka topic on a remote broker (ie not localhost) (2) Write the message to a file I modified the WikipediaFeedStreamTask.java to look like the following: public class…
Mohammad Ahmad
  • 133
  • 1
  • 1
  • 8