4

I am trying to convert an expression in Scala that is saved in database as String back to working code.

I have tried Reflect Toolbox, Groovy, etc. But I can't seem to achieve what I require.

Here's what I tried:


import scala.reflect.runtime.universe._
import scala.reflect.runtime.currentMirror
import scala.tools.reflect.ToolBox

val toolbox = currentMirror.mkToolBox()
val code1 = q"""StructType(StructField(id,IntegerType,true), StructField(name,StringType,true), StructField(tstamp,TimestampType,true), StructField(date,DateType,true))"""
val sType = toolbox.compile(code1)().asInstanceOf[StructType]

where I need to use the sType instance for passing customSchema to csv file for dataframe creation but it seems to fail.

Is there any way I can get the string expression of the StructType to convert into actual StructType instance? Any help would be appreciated.

Mario Galic
  • 47,285
  • 6
  • 56
  • 98
Highdef
  • 73
  • 8

2 Answers2

4

If StructType is from Spark and you want to just convert String to StructType you don't need reflection. You can try this:

import org.apache.spark.sql.catalyst.parser.LegacyTypeStringParser
import org.apache.spark.sql.types.{DataType, StructType}

import scala.util.Try

def fromString(raw: String): StructType =
  Try(DataType.fromJson(raw)).getOrElse(LegacyTypeStringParser.parse(raw)) match {
    case t: StructType => t
    case _             => throw new RuntimeException(s"Failed parsing: $raw")
  }

val code1 =
  """StructType(Array(StructField(id,IntegerType,true), StructField(name,StringType,true), StructField(tstamp,TimestampType,true), StructField(date,DateType,true)))"""
fromString(code1) // res0: org.apache.spark.sql.types.StructType

The code is taken from the org.apache.spark.sql.types.StructType companion object from Spark. You cannot use it directly as it's in private package. Moreover, it uses LegacyTypeStringParser so I'm not sure if this is good enough for Production code.

lukastymo
  • 26,145
  • 14
  • 53
  • 66
4

Your code inside quasiquotes, needs to be valid Scala syntax, so you need to provide quotes for strings. You'd also need to provide all the necessary imports. This works:

val toolbox = currentMirror.mkToolBox()
  val code1 =
    q"""
       //we need to import all sql types
       import org.apache.spark.sql.types._
       StructType(
           //StructType needs list
           List(
             //name arguments need to be in proper quotes
             StructField("id",IntegerType,true), 
             StructField("name",StringType,true),
             StructField("tstamp",TimestampType,true),
             StructField("date",DateType,true)
           )
       )
      """
val sType = toolbox.compile(code1)().asInstanceOf[StructType]

println(sType)

But maybe instead of trying to recompile the code, you should consider other alternatives as serializing struct type somehow (perhaps to JSON?).

Krzysztof Atłasik
  • 21,985
  • 6
  • 54
  • 76
  • I was obtaining the schema from the table in which the entry were stored from df.schema.toString. This works but a lot of replace and additional imports would be required inside the quasiString itself. I appreciate the solution though, thank you (: – Highdef Jun 16 '19 at 16:10