2

I'm experimenting with Google Cloud Data Fusion. I'm joining 2 BigQuery tables using the joiner + writing back to BigQuery. In preview I get this error : java.lang.NoSuchMethodError: net.jpountz.lz4.LZ4BlockInputStream.(Ljava/io/InputStream;Z)V

I've set the job to Spark instead of Map/Reduce because Map/reduce generates an out of memory issue in preview. When I deploy and run the job it crashes due to "Container killed by YARN for exceeding memory limits."

The largest tables is about 6 million records without any nested fields. The smaller table is 66 records.
I didn't specify any partitions.

What's the recommended way to debug/solve this issue? Increase the number of workers / memory?

1 Answers1

1

For the preview error " java.lang.NoSuchMethodError: net.jpountz.lz4.LZ4BlockInputStream.(Ljava/io/InputStream;Z)V" can you please provide the complete stack trace from the preview logs in UI?

Thanks and Regards,

Sagar

Sagar Kapare
  • 196
  • 4
  • Is it possible to look into this issue because it's currently a showstopper. – Koen Verschaeren May 17 '19 at 06:59
  • Hi Koen.. thanks for providing the stacktrace. We found similar issue in the SPARK https://issues.apache.org/jira/browse/SPARK-25928. It seems like LZ4 library does not work well with the Spark 2.3 version. in Cloud Data Fusion we do not use that library anyway to fix is to just exclude it - https://github.com/cdapio/cdap/pull/11350. – Sagar Kapare May 22 '19 at 17:56
  • Next release for Cloud Data Fusion will be available by next week. – Sagar Kapare May 22 '19 at 18:09
  • Great! I'll test as soon as the new release is available on GCP. – Koen Verschaeren May 23 '19 at 07:35
  • 1
    The fix is now available in newly-created instance of Data Fusion. – Ali Anwar May 23 '19 at 21:46