Questions tagged [apache-samza]

Apache Samza is a distributed stream processing framework.

Apache Samza is a distributed stream processing framework.

It uses Apache Kafka for messaging, and Apache Hadoop YARN to provide fault tolerance, processor isolation, security, and resource management.

It has support for stateful stream processing natively.

Apache Samza is a top level project of the Apache Software Foundation.

82 questions
1
vote
1 answer

Hello-Samza get NullPointException

I didn't use the grid script to start up the hello-samza project http://samza.apache.org/startup/hello-samza/0.11/ I follow the steps in grid script Download the hadoop, kafka & zookeeper Config the zookeeper, hadoop, kafka as it does in grid…
bstsnail
  • 26
  • 2
1
vote
0 answers

nc: command not found while setting up Apache Samza

When i try to setup Apache Samza, i get nc command not found when starting zookeeper. I'm running the command: bin/grid bootstrap i get /c/..../hello-samza/bin/grid: line 170: nc: command not found I'm writing the command in git bash console on…
1
vote
1 answer

Create checkpoint, coordinator and changelog kafka topics in separate cluster when using Samza

When using kafka with samza, samza is auto creating certains topics such as checkpoint, co-oridnator and changelog using the names from the properties file. But these topics are created in the same cluster. But for maintenance purpose, I want to…
1
vote
1 answer

MetricsSnapshotReporterFactory warning in samza job

I get the following warning in samza job: [main] WARN o.a.s.m.r.MetricsSnapshotReporterFactory.warn(66) - Unable to find implementation version in jar's meta info. Defaulting to 0.0.1. How can i fix it? What am i missing?
Barak Schoster
  • 301
  • 1
  • 3
  • 12
1
vote
1 answer

Increase logging level in Apache Samza

I'm trying to change the logging level for Apache Samza so I can get debug statements; the default is info. More specifically, I'm trying to get this debug statement to show up. I'm using Samza in a Clojure project. What is the best way to do this?…
1
vote
1 answer

How to deploy samza job on a remote Yarn Resource Manager

We are running samza job on hadoop yarn. Till now we were manually deploying job by calling run-job.sh on Resource Manager host. run-job.sh --config-factory=org.apache.samza.config.factories.PropertiesConfigFactory…
Coder
  • 490
  • 4
  • 18
1
vote
1 answer

ContainerRequestState [INFO] No more pending requests in queue

I am using a MapR (YARN) cluster with 3 nodes. I am trying to deploy 6 Samza jobs on the cluster for some processing on data streams. All jobs are correct. I tried deploying 2-3 in parallel and they work. However when I deploy all the 6 Samza jobs…
Zeeshan
  • 1,248
  • 1
  • 12
  • 19
1
vote
0 answers

Unable to kill YARN job

I have a simple Samza job, which I submit to our YARN cluster. The job allocates one single container and runs without any issues. When trying to kill the job, however, both the AM and job containers are left running on the NM, even though the RM…
David Yu
  • 71
  • 8
1
vote
1 answer

Can you inject serialized message into another protobuf message

We work with a pipeline of kafka/samza jobs using protobuf encoded messages. The pipeline can be quite lengthy for certain data sets and we want to add a timestamp/id for each stage in the pipeline to monitor efficiency and service health. The…
Philip Pryde
  • 930
  • 7
  • 13
1
vote
1 answer

Event stream data models

I'm working on coming-up with a set of schemas for a new eventing and stream processing system we are building at my company to tie together several currently disconnected systems. We have clearly defined 12 domain models and are now trying to put…
Bert Alfred
  • 431
  • 2
  • 7
  • 20
1
vote
1 answer

Does Samza work with ResourceManager in HA?

Does anyone have Samza working with resource manager in HA? If so, what do I set yarn.resourcemanager.hostname to in yarn-site.xml? If I set it to the first of my RMs, the the job submission works ok if I submit the job from that RM and the RM is…
John
  • 10,837
  • 17
  • 78
  • 141
1
vote
1 answer

For which purposes LinkedIn uses Kafka

Can anyone tell me for which specific purposes LinkedIn uses Kafka. I read quite many articles from linkedin blog about Kafka. Where they explain how they use Kafka and how much performance benefit they have achieved. Does Linkedin uses Kafka to…
1
vote
1 answer

Storm framework applications

I built an application for searching similar images stores in distributed environment using Hadoop. But Hadoop does not support real time processing, that why the response time is long. I know that Storm is another framework for big data analysis…
1
vote
3 answers

Per-user stream processing

I need to process data from a set of streams, applying the same elaboration to each stream independently from the other streams. I've already seen frameworks like storm, but it appears that it allows the processing of static streams only (i.e.…
1
vote
0 answers

Getting NullPointerException when using a S3 job file with Samza

I'm getting the following exception when passing a S3 file path to the yarn.package.path. Exception in thread "main" java.lang.NullPointerException at org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:433) at…
Joseph
  • 698
  • 5
  • 12