2

I have the below schema, val schema = new StructType( Array( StructField("Age",IntegerType,true), StructField("Name",StringType,true), ) )

I want to keep it in a separate file in the same format and use it in my Spark program. I have seen that I can create a json format schema in a file for the same and use it in my program. But is there a way to use the same structtype format in a file and then read it?

Just to note, my schema file can contain multiple schemas like

val schema1=...
val schema2=...
val schema3=...
Lakkhichhara
  • 55
  • 10

1 Answers1

1

In pyspark, python provides eval() function. You can use this function to parse the expression to get StructType.

>>> s=eval('StructType([StructField("name",StringType())])')
>>> s
StructType(List(StructField(name,StringType,true)))

But I'm not sure if we can achieve the same thing in scala. You can try with some options mentioned here.

You can use simple DDL format to specify the schema instead of JSON.

val schema = DataType.fromDDL("Age int, name string")

val structTypeSchema = schema.asInstanceOf[StructType]

You can mention your schema in a file as shown below:

schema1 = c1 string, c2 int
schema2 = c3 int

You can read this file and create schema variables as shown below:

val prop = new Properties()
prop.load(new FileReader("/file/path"))

val schema1 = DataType.fromDDL(prop.getProperty("schema1"))
val schema2 = DataType.fromDDL(prop.getProperty("schema2"))

If you want you can use this library to parse configs instead using Properties class.

Mohana B C
  • 5,021
  • 1
  • 9
  • 28
  • Thanks for your response. My schema file can have multiple schemas. Like schema1 = ..., schema2 = ... as I want to keep all the schemas in the same file so that I can find them in one place. I have edited the question as well. – Lakkhichhara Sep 14 '21 at 14:50
  • Thanks again. But instead of DDL format, can I use the StructType and StructField format in the file itself, the same way I have put the schema in the question. val schema = new StructType( Array( StructField("Age",IntegerType,true), StructField("Name",StringType,true), ) ) and then read that using prop.getProperty() – Lakkhichhara Sep 14 '21 at 15:28
  • 1
    But once you read that from file you will have schema as string not as StructType. As I mentioned python has `eval` to parse schema expression but need to check if we have such functions in scala. Explore the link which I added in my answer. – Mohana B C Sep 14 '21 at 15:34
  • Is there an option to handle multiline schema as I have huge number of columns in the schema .I want to keep it in multilines so that it's easily readable. for example, schema1 = c1 string, c2 int schema2 = c3 int – Lakkhichhara Sep 15 '21 at 10:41
  • use \ at the end of the line. – Mohana B C Sep 15 '21 at 12:21