I am very new to Camus and Hadoop, and am running into an exception error. I am trying to write some avro files to a hdfs, and keep getting the following error block:
[EtlMultiOutputRecordWriter] - ExceptionWritable key: topic=_schemas partition=0leaderId=0 server= service= beginOffset=0 offset=0 msgSize=1024 server= checksum=0 time=1450371931447 value: java.lang.Exception
at com.linkedin.camus.etl.kafka.common.KafkaReader.getNext(KafkaReader.java:108)
at com.linkedin.camus.etl.kafka.mapred.EtlRecordReader.nextKeyValue(EtlRecordReader.java:232)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:556)
at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NullPointerException
... 14 more
I looked up line 108 in com.linkedin.camus.etl.kafka.common.KafkaReader.getNext
and found it to be this: MessageAndOffset msgAndOffset = messageIter.next();
.
I am using io.confluent.camus.etl.kafka.coders.AvroMessageDecoder
for my decoder and com.linkedin.camus.example.DummySchemaRegistry
for my coder.
At the end of the logs I get another line indicating an error from one of the hdfs files: Error from file [hdfs://localhost:9000/user/username/exec/2015-12-17-17-05-25/errors-m-00000]
. The error-m-00000 file contains a somewhat readable beginning, but then changes to an undecipherable string:
SEQ*com.linkedin.camus.etl.kafka.common.EtlKey5com.linkedin.camus.etl.kafka.common.ExceptionWritable*org.apache.hadoop.io.compress.DefaultCodec|Ò ∫±ß˝}pºHí$ò¸·:0schemasQ∞∆øÿxúïîÀN√0E7l‡+∫»¢lFMõ>
á*êxU®™ËzÍmàc[ÆÕ„XÚÕÿqZ%@[ÿD±gÓô…¯∆üGœ¯Ç¿Q,·Úçë2ô'«hZL¿3ëSö
Xÿ5ê·ê„Sé‡ÇÖpÎS¬î4,…LËÕ¥Î{û}wFßáâ*M)>%&uZÑCfi“˚#rKÌÔ¡flÌu^Í%†B∂"Xa*•⁄0ÔQÕpùGzùidy&ñªkT…śԈ≥-#0>›…∆RG∫.ˇÅ¨«JÚ®sÃ≥Ö¡\£Rîfi˚ßéT≥D#%T8ãW®ÚµÌ∫4N˙©W∫©mst√—Ôå¶¥óhÓ$C~#S+Ñâ{ãÇfl¡ßí⁄L´ÏíÙºÙΩ5wfÃjM¬∏_Äò5RØ£ Ë"Eeúÿëx{ÆÏ«{XW÷XM€O¨-C#É¡Òl•ù9§‰õö2ó:wɲ%Œ-N∫ˇbFXˆ∑:àá5fyQÑ‘ö™:roõ1⁄5•≠≈˚yM0±ú?»ÃW◊.h≈I´êöNæ [û3
At the end it appears that a hadoop job has run, but a commit never takes place, based of the timing report:
Job time (seconds):
pre setup 1.0 (11%)
get splits 1.0 (11%)
hadoop job 4.0 (44%)
commit 0.0 (0%)
Total: 0 minutes 9 seconds
Any help or an idea of where to look to resolve this would be greatly appreciated. Thank you.