0

I am using com.cloudera.crunch version: '0.3.0-3-cdh-5.2.1'.

I have a small program that reads some AVROs and filters out invalid data based on some criteria. I am using pipeline.write(PCollection, AvroFileTarget) to write the invalid data output. It works fine in production run.

For unit testing this piece of code, I use MemPipeline instance. But, it fails while writing the output in that case.

I get error:

java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCrc32.nativeComputeChunkedSumsByteArray(II[BI[BIILjava/lang/String;JZ)V
    at org.apache.hadoop.util.NativeCrc32.nativeComputeChunkedSumsByteArray(Native Method)
    at org.apache.hadoop.util.NativeCrc32.calculateChunkedSumsByteArray(NativeCrc32.java:86)
    at org.apache.hadoop.util.DataChecksum.calculateChunkedSums(DataChecksum.java:428)
    at org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunks(FSOutputSummer.java:197)
    at org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:163)
    at org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:144)
    at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:78)
    at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:50)
    at java.io.DataOutputStream.writeBytes(DataOutputStream.java:276)
    at com.cloudera.crunch.impl.mem.MemPipeline.write(MemPipeline.java:159)

Any idea what's wrong?

Yogesh
  • 63
  • 1
  • 10
  • I recall seeing some existing bugs in MemPipeline's AVRO handling; is your schema anything especially complex? Are you able to write any Avro records using that schema in a MemPipeline, or is it only the invalid records you're filtering out that throw this error? – Suriname0 Aug 22 '16 at 22:09
  • Hi, I am not able to write any records using MemPipeline. MemPipeline.write() always gives me this error. – Yogesh Aug 24 '16 at 03:24
  • It's probably a problem with your schema then. Try creating a simple test with a very basic Avro schema (e.g. a record with a single String field) and see if you are able to write materialized records to disk. If you can't, it's likely an issue with your dependencies; if you're using a tool like Maven, inspect the dependency tree and consider explicitly excluding some transitive dependencies that may be causing problems. – Suriname0 Aug 24 '16 at 14:57

1 Answers1

1

Hadoop environment variable should be configured properly along with hadoop.dll and winutils.exe.

Also pass the JVM argument while executing MR job/application -Djava.library.path=HADOOP_HOME/lib/native

isudarsan
  • 437
  • 1
  • 6
  • 14