1

I need to write two functions to get the output format and the output index for file conversion. As part of this, I wrote a TransformSettings class for these methods and set the default value. And in the transformer class, I created a new object of TransformSettings class to get the default values for each job run. Also, I have another class called ParquetTransformer that extends Transformer where I want to change these default values. So I implemented like below.

class TransformSettings{
  def getOuputFormat: String = {
   "orc"
  }
  def getOuputIndex(table: AWSGlueDDL.Table): Option[String] = {
   table.StorageDescriptor.SerdeInfo.Parameters.get("orc.column.index.access")
  }
}

class Transformer{
 def getTransformSettings: TransformSettings = {
   new TransformSettings
 }

 def posttransform(table: AWSGlueDDL.Table):Dateframe ={
  val indexAccess = getTransformSettings.getOuputIndex(table: AWSGlueDDL.Table)
  ........
 }
}

class ParquetTransformer extends Transformer{
  override def getTransformSettings: TransformSettings = {
   val transformSettings = new TransformSettings {
   
   override def getOuputFormat: String = {
    "parquet"
   }

   override def getOuputIndex(table: AWSGlueDDL.Table): Option[String] = {
    table.StorageDescriptor.SerdeInfo.Parameters.get("parquet.column.index.access")
   }

   }
  }
}

Is there a way to avoid creating a brand new object of TransformSettings in Transfomer class every time this is called?

Also is there a way to rewrite the code using Scala value class?

TylerH
  • 20,799
  • 66
  • 75
  • 101
vsathyak
  • 73
  • 8

1 Answers1

0

As @Dima proposed in the comments try to make TransformSettings a field / constructor parameter (a val) in the class Transformer and instantiate them outside

class TransformSettings{
  def getOuputFormat: String = {
    "orc"
  }
  def getOuputIndex(table: AWSGlueDDL.Table): Option[String] = {
    table.StorageDescriptor.SerdeInfo.Parameters.get("orc.column.index.access")
  }
}

class Transformer(val transformSettings: TransformSettings) {
  def posttransform(table: AWSGlueDDL.Table): DataFrame ={
    val indexAccess = transformSettings.getOuputIndex(table: AWSGlueDDL.Table)
    ???
  }
}

val parquetTransformSettings = new TransformSettings {
  override def getOuputFormat: String = {
    "parquet"
  }

  override def getOuputIndex(table: AWSGlueDDL.Table): Option[String] = {
    table.StorageDescriptor.SerdeInfo.Parameters.get("parquet.column.index.access")
  }
}

class ParquetTransformer extends Transformer(parquetTransformSettings)

You don't seem to need value classes (... extends AnyVal) now. They are more about unboxing, not about life-cycle management. TransformSettings and Transformer can't be value classes because they are not final (you're extending them in class ParquetTransformer extends Transformer... and new TransformSettings { ... }). By the way, value classes have many limitations

https://failex.blogspot.com/2017/04/the-high-cost-of-anyval-subclasses.html

https://github.com/scala/bug/issues/12271

Besides value classes, there are scala-newtype library in Scala 2 and opaque types in Scala 3.

Dmytro Mitin
  • 48,194
  • 3
  • 28
  • 66