1

I have an issue where Spark is failing to generate code for a case class. Here is the spark error

Caused by: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 52, Column 43: Identifier expected instead of '.'

Here is the referenced line in the generated code

/* 052 */ private com.avro.message.video.public.MetricObservation MapObjects_loopValue34;

It should be noted that com.avro.message.video.public.MetricObservation is a nested case class in part of a larger hierarchy. It is also used in other places in the code fine. It should also be noted that this pipeline works fine if I use the RDD API, but I want to use the Dataset API because I want to write out the Dataset in parquet. Has anyone seen this issue before?

I'm using Scala 2.11 and Spark 2.1.0. I was able to upgrade to Spark 2.2.1 and the issue is still there.

Jon
  • 3,985
  • 7
  • 48
  • 80

2 Answers2

1

Do you think that SI-7555 or something like it has any bearing on this? I have noticed the past that Scala reflection has had issues generating TypeTags for statically nested classes. Do you think something like that is going on or is this strictly a catalyst issue in spark? You might want to file a spark ticket too.

deaktator
  • 21
  • 3
0

So it turns out that changing the package name of the affect class "fixes" (ie made go away) the problem. I really have no idea why this is or even how to reproduce it in a small test case. What worked for me was I just created a higher level package that work. Specifically com.avro.message.video.public -> com.avro.message.publicVideo.

Jon
  • 3,985
  • 7
  • 48
  • 80