0

I am on Spark 1.x, and attempting to read csv files. If I need to specify some data types, as per the documentation, I need to import the types defined in the package org.apache.spark.sql.types.

import org.apache.spark.sql.types.{StructType,StructField,StringType};

This works fine when I use this interactively in spark-shell, but as I want to run this thru spark-submit, I wrote some Scala code to do this. But, when I attempt to compile my Scala code, it gives me an error saying it could NOT find org.apache.spark.sql.types. I looked up the jar contents of spark-sql, but couldn't find these types defined in there.

So, which jar has org.apache.spark.sql.types?

sudheeshix
  • 1,541
  • 2
  • 17
  • 28

1 Answers1

1

I looked at the source code for spark-sql at GitHub to realize that these types can be found in the spark-catalyst jar. That didn't seem intuitive.

Also, since StructType has this code

org.json4s.JsonDSL._

we end up with another dependent jar - json4s-core.

sudheeshix
  • 1,541
  • 2
  • 17
  • 28
  • 3
    if you use any standard build tool (maven / sbt / gradle) you wouldn't have to bother with this - `spark-sql` jar declares these jars as its _transitive dependencies_, which would make any build tool fetch these when `spark-sql` is used. – Tzach Zohar Feb 15 '17 at 14:49
  • @TzachZohar, that helps. Thanks! – sudheeshix Feb 15 '17 at 14:52