-1

I'm using Apache Flink (v1.11) with Scala and added an own DeserializationSchema for Kafka connector. Therefore i would like to use my own packages and versions of jackson (v2.12.0).

But i got the following error:

Exception in thread "main" java.lang.VerifyError: Cannot inherit from final class
    at java.lang.ClassLoader.defineClass1(Native Method)
    at java.lang.ClassLoader.defineClass(ClassLoader.java:756)
    at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
    at java.net.URLClassLoader.defineClass(URLClassLoader.java:468)
    at java.net.URLClassLoader.access$100(URLClassLoader.java:74)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:369)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:362)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
    at com.fasterxml.jackson.dataformat.csv.CsvMapper.<init>(CsvMapper.java:108)
    at de.integration_factory.datastream.types.CovidEventSchema.<init>(CovidEventSchema.scala:14)
    at de.integration_factory.datastream.Aggregate_Datastream$.main(Aggregate_Datastream.scala:34)
    at de.integration_factory.datastream.Aggregate_Datastream.main(Aggregate_Datastream.scala)

This is my EventSchema:

import com.fasterxml.jackson.dataformat.csv.CsvMapper
import com.fasterxml.jackson.datatype.joda.JodaModule
import org.apache.flink.api.common.serialization.{DeserializationSchema, SerializationSchema}
import org.apache.flink.api.common.typeinfo.TypeInformation

@SerialVersionUID(6154188370181669758L)
class CovidEventSchema extends DeserializationSchema[CovidEvent] with SerializationSchema[CovidEvent] {

  private val mapper = new CsvMapper
  mapper.registerModule(new JodaModule)

  val csvSchema = mapper
    .schemaFor(classOf[CovidEvent])
    .withLineSeparator(",")
    .withoutHeader()
  val  reader = mapper.readerWithSchemaFor(classOf[CovidEvent])

  def serialize(event: CovidEvent): Array[Byte] = mapper.writer(csvSchema).writeValueAsBytes()

  @throws[IOException]
  def deserialize(message: Array[Byte]): CovidEvent = reader.readValue[CovidEvent](message)


  def isEndOfStream(nextElement: CovidEvent) = false

  def getProducedType: TypeInformation[CovidEvent] = TypeInformation.of(classOf[CovidEvent])
}

This is my PoJo for schema:

import com.fasterxml.jackson.annotation.JsonFormat;
import lombok.AllArgsConstructor;
import lombok.Data;
import lombok.NoArgsConstructor;
import org.joda.time.DateTime;


@Data
@NoArgsConstructor
@AllArgsConstructor
public class CovidEvent {

    private long objectId;
    private int bundeslandId;
    private String bundesland;
    private String landkreis;
    private String altersgruppe;
    private String geschlecht;
    private int anzahlFall;
    private int anzahlTodesfall;
    @JsonFormat(shape = JsonFormat.Shape.STRING, pattern = "yyyy-MM-dd HH:mm:ss", timezone = "UTC")
    private DateTime meldedatum;
    private int landkreisId;
    private String datenstand;
    private int neuerFall;
    private int neuerTodesfall;
    private String refDatum;
    private int neuGenesen;
    private int anzahlGenesen;
    @JsonFormat(shape = JsonFormat.Shape.NUMBER)
    private boolean istErkrankungsbeginn;
    private String altersGruppe2;

    public long getEventtime() {
        return meldedatum.getMillis();
    }

}

After some research I found out that the error is probably caused by different Jackson versions in the classpath.

I thought it would be possible to use own version of Jackson, because Flink shaded the own versions.

What am I doing wrong?

UPDATE: If i import the jackson classes from shaded flink package it is working

org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.CsvMapper

But so i am dependent on the flink shaded jackson version.

UPDATE: So a better implementation by using open is something like this ?

class CovidEventSchema extends DeserializationSchema[CovidEvent] with SerializationSchema[CovidEvent] {

   private  var  reader: ObjectReader = null

  private var writer: ObjectWriter = null

  override def open(context: SerializationSchema.InitializationContext): Unit = {

     val mapper  = new CsvMapper()

     val csvSchema = mapper
      .schemaFor(classOf[CovidEvent])
      .withLineSeparator(",")
      .withoutHeader()

    this.reader = mapper.readerFor(classOf[CovidEvent]).`with`(csvSchema)
    this.writer = mapper.writer(csvSchema)
    super.open(context)
  }
}
JanOels
  • 111
  • 11
  • Maybe it is because apache-calcite brings it own unshaded jackson version ? And calcite is an dependency from flink sql api. So maybe it is bad to have Flink SQL related dependencies in pom.xml while using datatstream api ? – JanOels Jan 12 '21 at 19:28

1 Answers1

0

It would work if the classloader of Flink are used. However, the way your setup works, you are just loading your user code in the system classloader while the whole DataStream application is created. I'm not going to much into more details (unless requested in a follow-up) and go for the solution:

Your DeserializationSchema should never initialize heavy resources during creation (this happens on client or job manager side), but only in open (which happens on task manager). So please move

private val mapper = new CsvMapper
  mapper.registerModule(new JodaModule)

into open.

It only works with the bundled version because - lucky for you - ObjectMapper implements Serializable but that is rarely the case for parsers and actually completely unnecessary if the deserializer is initialized correctly.

Arvid Heise
  • 3,524
  • 5
  • 11
  • Thank you much for your answer. So the following code example is the right way? – JanOels Jan 25 '21 at 17:04
  • Yes it looks much better - let's see if this solves your problem (the jackson classes still need to be initialized, so it could still fail.) Alternatively you need to align you jackson versions :/. – Arvid Heise Jan 27 '21 at 11:15
  • It seems you are right, now there is a `NullPointerException` at `CovidEventSchema.deserialize(CovidEventSchema.scala:32)`. What can i do? Maybe i'm doing things completely wrong. Is there a better way to deserialize CSV data from Kafka with Datastream API? – JanOels Jan 27 '21 at 15:51
  • I got the same error by using shaded jackson dependencies. It seems like that the `open` is not working. – JanOels Jan 27 '21 at 16:02
  • I can't help much without a stacktrace and code. The easiest way to consume CSV in Kafka is (table API)[https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/connectors/formats/csv.html]. (in general I'd discourage CSV in Kafka) – Arvid Heise Jan 28 '21 at 09:08
  • Okay, thanks. Why do you discourage CSV in Kafka? – JanOels Jan 30 '21 at 13:09
  • CSV is not solving any of problems a good data format offers: it has no schema information and either requires the full header to be attached to each message or it does not support any kind of schema evolution. Further, it's slow both in writing and reading. The recommend format for Kafka is Avro and it has good tooling support. – Arvid Heise Jan 31 '21 at 11:38