Questions tagged [apache-samza]

Apache Samza is a distributed stream processing framework.

Apache Samza is a distributed stream processing framework.

It uses Apache Kafka for messaging, and Apache Hadoop YARN to provide fault tolerance, processor isolation, security, and resource management.

It has support for stateful stream processing natively.

Apache Samza is a top level project of the Apache Software Foundation.

82 questions
0
votes
1 answer

Hello samza not able to run properly in windows 7

I did everything as shown in http://samza.apache.org/startup/hello-samza/0.13/ Basically, clone the repo and type "bin/grid bootstrap". However in the end I got an error message saying zookeeper not able to start as shown below, does anyone know how…
teddy
  • 413
  • 3
  • 8
  • 24
0
votes
1 answer

is there a alternative choice for job.coordinator.system

I want to use samza, but case is our kafka topic creation is limited (topic creation should be reviewed and should has concrete porpose). So, is there any other choice for "job.coordinator.system"? And I need the usage intro. Thanks a lot!
beijicy
  • 9
  • 2
0
votes
1 answer

How can a samza task consume more than one kafka partitioned streams

I have a typical samza task which consumes 2 topics: data and config, and stores messages from config as local state in rocksdb to check if messages from data are OK. This task works fine if each of these two topics has only one partition. Once I…
Aries
  • 211
  • 2
  • 10
0
votes
2 answers

Hello-samza - task stays in Accepted state

I'm trying to launch hello-samza example starting from master branch. I've run every command without errors, started run-job.sh without errors, but job in YARN stays forever in ACCEPTED state. I've looked at http://localhost:8088/cluster/nodes and…
grz.miejski
  • 173
  • 1
  • 2
  • 10
0
votes
1 answer

Can I set task.commit.ms to every 1ms?

I have a project with Apache-Samza and I have a problem with duplicate data. This is my checkpoint configuration : …
MaximeF
  • 4,913
  • 4
  • 37
  • 51
0
votes
0 answers

Can Spark / Samza / Storm un-do past commits and regenerate views?

I just watched Turning the database inside-out and noticed a similarity between Samza and Redux: all state consists of a stream of immutable objects. This made me realize that if you edited the stream after-the-fact, you could in theory regenerate…
stevendesu
  • 15,753
  • 22
  • 105
  • 182
0
votes
1 answer

How get Application Id on Samza worker?

I don't need "container id" or "App Attempt Id". On the documentation I see we can put ${samza.log.dir} for the log4j config and this path contain the application id. It's like /foo/log/../application_id_123
MaximeF
  • 4,913
  • 4
  • 37
  • 51
0
votes
1 answer

Issues while loading properties files from samza job running on yarn cluster

I have a samza job which I am trying to run on yarn cluster using ./bin/run-job.sh --config-factory=org.apache.samza.config.factories.PropertiesConfigFactory --config-path=file:///home/anshu/samzaJob.properties The job triggers and runs fine…
Ansh
  • 357
  • 1
  • 2
  • 13
0
votes
1 answer

How to reset kafka to integrate it in a JUnit testing process?

I am testing and debugging an event-sourcing (or stateful stream processing) application that runs in top of kafka and uses samza. I want to remove queues and topics in kafka so that samza jobs get an empty kafka installation at startup. How can I…
user2108278
  • 391
  • 5
  • 17
0
votes
2 answers

How to connect samza to other systems and how to write systemFactory class

Using below configuration I am able to connect samza to…
0
votes
2 answers

Do I use the same key-value storage (RockDBs) in 2 different StreamTask same time?

I use Apache as Samza framework for Kafka and I need to share the same RockDBs key-value storage between 2 Tasks. Is what I can do so without having concurrency on storage Key value?
MaximeF
  • 4,913
  • 4
  • 37
  • 51
0
votes
4 answers

How to write my own job in samza

Recently I am trying to do some stream processing work on Samza framework. I have deployed the hello-samza example successfully. However, when I try to write my own job, I have no idea where to start my work. I have read this document, but I still…
zzx
  • 9
  • 3
0
votes
1 answer

How to deploy & run Samza job on HDFS?

I want to get a Samza job running on a remote system with the Samza job being stored on HDFS. The example (https://samza.apache.org/startup/hello-samza/0.7.0/) for running a Samza job on a coal machine involves building a tar file, then unzipping…
John
  • 10,837
  • 17
  • 78
  • 141
0
votes
1 answer

Does Samza's OutgoingMessageEnvelope require a SerDe for partitionKey and how do I specify it?

Similar to how-can-you-create-a-partition-on-a-kafka-topic-using-samza I need to construct a message controlling how it's routed via use of partitionKey. key and message do require a SerDe but I'm not sure if partitionKey does as well. If so what is…
Edi Bice
  • 566
  • 6
  • 18
0
votes
2 answers

yarn java process not killed

I have installed Apache Samza, that uses Yarn to manage the jobs. It is running on two Debian servers on virtual machines. Samza is version 0.9.1. Hadoop is version 2.6.0. I am seeing two different problems that I am not sure if they are related,…
jordi
  • 1,157
  • 1
  • 13
  • 37