Questions tagged [camus]

LinkedIn's Kafka to HDFS pipeline.

An api to pull data from kafka into HDFS. It fetches events of available topics and then store them topic wise. It is also responsible for collecting event count statistics.

https://github.com/linkedin/camus/wiki/Camus-Overview

19 questions
2
votes
0 answers

NullPointerException in Camus Job [EtlMultiOutputRecordWriter] - ExceptionWritable

I am very new to Camus and Hadoop, and am running into an exception error. I am trying to write some avro files to a hdfs, and keep getting the following error block: [EtlMultiOutputRecordWriter] - ExceptionWritable key: topic=_schemas…
the3rdNotch
  • 637
  • 2
  • 8
  • 18
2
votes
0 answers

Running Camus on

Trying to run a Oozie coordinator with a java action workflow that consists of running a Camus mapper job. The coordinator seems to run, and start the workflow every 20 minutes, but the workflow would just run indefinitely, even though the job when…
2
votes
4 answers

camus-example work with kafka

My usecase is I want to push Avro data from Kafka to HDFS. Camus seems to be right tool, however I am not able to make it work. I am new to camus, trying to make camus-example work, https://github.com/linkedin/camus Now I am trying to make…
Sandy
  • 21
  • 6
1
vote
1 answer

Camus Migration - Kafka HDFS Connect does not start from the set offset

I am currently using Confluent HDFS Sink Connector (v4.0.0) to replace Camus. We are dealing with sensitive data so we need to maintain consistency in offset during cutover to connectors. Cutover plan: We created hdfs sink connector and subscribed…
Rupesh More
  • 35
  • 2
  • 10
1
vote
1 answer

Whats the expected commit/rollback behavior of Camus?

We've been running Camus for about a year successfully to pull avro payloads from Kafka (ver 0.82) and store as .avro files in HDFS, using just a few Kafka topics. Recently, a new team within our company registered about 60 new topics in our…
1
vote
0 answers

Camus - Writing to multiple file types

I am quite new to using LinkedIn's camus and have successfully written data files from Kafka to Hdfs. In general, I use JsonStringMessagdecoder to read a JSON and write the same to .dat file using StringRecordWriterProvider. But is it possible to…
0
votes
0 answers

Camus dataset in pytorch: I am trying to see the sizes of the images and i don't know exactly what i am reading

i wanted to ask if anyone has worked on CAMUS dataset with pytorch.Because there are a lot of things i can not understand. At first when i run to see size of the images, i take something like that: (1, 843, 512) (1, 1232, 748) (1, 779, 472) (1,…
0
votes
0 answers

updating kafka dependency in camus is causing messages not read by EtlRecordReader

In my project camus is used for long time and it is never get updated. The camus project uses kafka version 0.8.2.2. I want to find a workaround to use kafka 1.0.0. So I cloned the directory and updated the dependency. When I do that the Message…
user51
  • 8,843
  • 21
  • 79
  • 158
0
votes
1 answer

How to partition Gobblin output to 30 min partitions?

We are planning to migrate from Camus to Gobblin. In Camus we were using below mentioned…
mukul
  • 433
  • 7
  • 18
0
votes
1 answer

Convert epoch timestamp to date time format using camus properties

My kafka message has multiple fields which contain epoch timestamps in long format. My message looks like this : { "someDate1":1512725505000, "someDate2":1518060461000, "ABC":"XYZ", "PQR":"MNO" } Is there a way to convert all these…
coder_r
  • 90
  • 2
  • 11
0
votes
1 answer

How to resolve NoClassDefFoundError while using Hadoop?

I am getting Exception in thread "main" java.lang.NoClassDefFoundError: com/linkedin/camus/etl/IEtlKey. On running the command: hadoop jar camus-etl-kafka-0.1.0-SNAPSHOT.jar com.linkedin.camus.etl.kafka.CamusJob -P camus.properties I am getting…
0
votes
1 answer

Gobblin Map-reduce job running successfully on EMR but no output in s3

I am running gobblin to move data from kafka to s3 using 3 node EMR cluster. I am running on hadoop 2.6.0 and I also built gobblin against 2.6.0. It seems like map-reduce job runs successfully. On my hdfs i see metrics and working directory. metrics…
user2942227
  • 1,023
  • 6
  • 19
  • 26
0
votes
1 answer

Loading apache server logs to HDFS using Kafka

I want to load apache server logs to hdfs using Kafka. creating topic: ./kafka-topics.sh --create --zookeeper 10.25.3.207:2181 --replication-factor 1 --partitions 1 --topic lognew tailing the apache access log directory: tail -f …
Deepthy
  • 139
  • 5
  • 14
0
votes
2 answers

How do I decide number of mappers for camus?

I just started with Camus. I am planning to run Camus, every one hour. We get around ~80000000 messages every hour and average message size is 4KB (we have a single topic in Kafka). I first tried with 10 mappers, it took ~2hours to copy one hour's…
Prachi g
  • 849
  • 3
  • 9
  • 23
0
votes
1 answer

Setting frequecy for Camus jobs

I have just started with Camus. I am planning to run camus job every hour. We get ~80000000 messages (with ~4KB avg size) every hour. How do I set the following properties: # max historical time that will be pulled from each partition based on event…
Prachi g
  • 849
  • 3
  • 9
  • 23
1
2