1

I want to parse a file having content as json format. From the file I want to extract few properties (name, DataType, Nullable) to create some column names dynamically. I have gone through some examples but most of them are using case class but my problem is every time I will receive a file may have different content.

I tried to use the ujson library to parse the file but I am unable to understand how to use it properly.

object JsonTest {
  def main(args: Array[String]): Unit = {

    val source = scala.io.Source.fromFile("C:\\Users\\ktngme\\Desktop\\ass\\file.txt")
    println(source)
    val input = try source.mkString finally source.close()
    println(input)

    val data = ujson.read(input)
    data("name") = data("name").str.reverse
    val updated = data.render()
  }
}

Content of the file example:

{
"Organization": {
"project": {
"name": "POC 4PL",
"description": "Implementation of orderbook"
},
"Entities": [
{
"name": "Shipments",
"Type": "Fact",
"Attributes": [
{
"name": "Shipment_Details",
"DataType": "StringType",
"Nullable": "true"
},
{
"name": "Shipment_ID",
"DataType": "StringType",
"Nullable": "true"
},
{
"name": "View_Cost",
"DataType": "StringType",
"Nullable": "true"
}
],
"ADLS_Location": "/mnt/mns/adls/raw/poc/orderbook/"
}
]
}
}

Expected output:

StructType(
Array(StructField("Shipment_Details",StringType,true),
StructField("Shipment_ID",DateType,true),   
StructField("View_Cost",DateType,true))) 

StructType needs to be added to the expected output programatically.

Stark
  • 604
  • 3
  • 11
  • 30

2 Answers2

0

It depends if you want it to be completely dynamic or not, here are some options:

If you just want to read one field you can do:

import upickle.default._

val source = scala.io.Source.fromFile("C:\\Users\\ktngme\\Desktop\\ass\\file.txt")
val input = try source.mkString finally source.close()
val json = ujson.read(input)

println(json("Organization")("project")("name")) 

the output will be: "POC 4PL"

If you just want just the Attributes to be with types, you can do:

import upickle.default.{macroRW, ReadWriter => RW}
import upickle.default._

val source = scala.io.Source.fromFile("C:\\Users\\ktngme\\Desktop\\ass\\file.txt")
val input = try source.mkString finally source.close()
val json = ujson.read(input)
val entitiesArray = json("Organization")("Entities")(0)("Attributes")
println(read[Seq[StructField]](entitiesArray))

case class StructField(name: String, DataType: String, Nullable: String)
object StructField{
  implicit val rw: RW[StructField] = macroRW
}

the output will be: List(StructField(Shipment_Details,StringType,true), StructField(Shipment_ID,StringType,true), StructField(View_Cost,StringType,true))

another option, is to use a different library to do the class mapping. If you use Google Protobuf Struct and JsonFormat it can be 2-liner:

import com.google.protobuf.Struct
import com.google.protobuf.util.JsonFormat

val source = scala.io.Source.fromFile("C:\\Users\\ktngme\\Desktop\\ass\\file.txt")
val input = try source.mkString finally source.close()

JsonFormat.parser().merge(input, builder)
println(builder.build())

the output will be: fields { key: "Organization" value { struct_value { fields { key: "project" value { struct_value { fields { key: "name" value { string_value: "POC 4PL" } } fields { key: "description" value { string_value: "Implementation of orderbook" } } } } } fields { key: "Entities" value { list_value { values { struct_value { fields { key: "name" value { string_value: "Shipments" } }...

Boaz
  • 1,212
  • 11
  • 25
  • Hey @Boaz I didn't understand the Google Protobuf. If I use the first solution then would I be able to extract attributes dynamically? How do I pass the input file path. Can you please share a complete example. Thanks for your support. – Stark Jul 21 '19 at 18:48
  • @Stark I added full examples and links to the Google library. – Boaz Jul 22 '19 at 07:42
  • while executing the above mention program I am getting null pointer exception :( sharing you the exception message. – Stark Jul 22 '19 at 19:36
  • Exception in thread "main" java.lang.NullPointerException at ujson.AstTransformer$class.transformObject(AstTransformer.scala:15) at ujson.Value$.transformObject(Value.scala:86) at ujson.Value$.transform(Value.scala:152) at ujson.Value$.transform(Value.scala:86) at ujson.AstTransformer$$anonfun$transformArray$1.apply(AstTransformer.scala:11) at ujson.AstTransformer$$anonfun$transformArray$1.apply(AstTransformer.scala:11) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) – Stark Jul 22 '19 at 19:37
  • at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at ujson.AstTransformer$class.transformArray(AstTransformer.scala:11) at ujson.Value$.transformArray(Value.scala:86) at ujson.Value$.transform(Value.scala:151) at ujson.Value$class.transform(Value.scala:75) at ujson.Arr.transform(Value.scala:211) at upickle.Api$class.read(Api.scala:32) at upickle.default$.read(Api.scala:102) at StructField$.delayedEndpoint$StructField$1(StructField.scala:18) at StructField$delayedInit$body.apply(StructField.scala:10) – Stark Jul 22 '19 at 19:38
  • at scala.Function0$class.apply$mcV$sp(Function0.scala:34) at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12) at scala.App$$anonfun$main$1.apply(App.scala:76) at scala.App$$anonfun$main$1.apply(App.scala:76) at scala.collection.immutable.List.foreach(List.scala:381) at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35) at scala.App$class.main(App.scala:76) at StructField$.main(StructField.scala:10) at StructField.main(StructField.scala) Process finished with exit code 1 – Stark Jul 22 '19 at 19:39
  • import upickle.default.{macroRW, ReadWriter => RW} import upickle.default._ case class StructField(name: String, DataType: String, Nullable: String) object StructField extends App { val source = scala.io.Source.fromFile("C:\\Users\\ktngme\\Desktop\\ass\\file.txt") val input = try source.mkString finally source.close() val json = ujson.read(input) val entitiesArray = json("Organization")("Entities")(0)("Attributes") println(read[Seq[StructField]](entitiesArray)) implicit val rw: RW[StructField] = macroRW } – Stark Jul 22 '19 at 19:41
  • I run the above code with `input` equal to what you put in the question under `Content of the file example`. try not to read the json from file, put it hardcoded just to make sure all the ujson part is working – Boaz Jul 23 '19 at 10:59
  • I must have to pass the Json content through file. And if I pass the content as hard-coded hen it will not be a generic solution. – Stark Jul 24 '19 at 04:43
  • I meant only for you to fined the root cause of the problem. read file to string is one problem, and deserialize json to object is a different problem. for me the example above works – Boaz Jul 24 '19 at 05:36
0

Try Using Playframework's Json utils - https://www.playframework.com/documentation/2.7.x/ScalaJson

Here's the solution to your issue- \ Placed your json in text file

    val fil_path = "C:\\TestData\\Config\\Conf.txt"
    val conf_source = scala.io.Source.fromFile(fil_path)
    lazy val json_str = try conf_source.mkString finally conf_source.close()
    val conf_json: JsValue = Json.parse(json_str)
    val all_entities: JsArray = (conf_json \ "Organization" \ "Entities").get.asInstanceOf[JsArray]
    val shipments: JsValue = all_entities.value.filter(e => e.\("name").as[String] == "Shipments").head
    val shipments_attributes: IndexedSeq[JsValue] = shipments.\("Attributes").get.asInstanceOf[JsArray].value
    val shipments_schema: StructType = StructType(shipments_attributes.map(a => Tuple3(a.\("name").as[String], a.\("DataType").as[String], a.\("Nullable").as[String]))
      .map(x => StructField(x._1, StrtoDatatype(x._2), x._3.toBoolean)))
    shipments_schema.fields.foreach(println)

Output is -

StructField(Shipment_Details,StringType,true)
StructField(Shipment_ID,StringType,true)
StructField(View_Cost,StringType,true)
ValaravausBlack
  • 691
  • 5
  • 12