1

I have a scala function, compute the a difference between tow date, that taking two LocalDateTime as parameters:

I have a dataFrame that contain tow field start_date and finish_date.

I want construst an UDF "may be" to apply the function toEquals on my dataframe especially on the é field start_date and finish_date to compute the difference between them. But the type of start_date and finish_date are String.

vero
  • 1,005
  • 6
  • 16
  • 29

1 Answers1

1

I haven't tested the code yet but using your toEquals logic in a udf function should be enough as

import org.apache.spark.sql.functions.udf
def toEquals = udf((rd1: String, rd2: String) => {
  val d1 = adjust(LocalDateTime.parse(rd1, DATE_TIME_FORMATTER))
  val d2 = adjust(LocalDateTime.parse(rd2, DATE_TIME_FORMATTER), asc = false)     
  if (d1.isAfter(d2)) 0.hours.toString
  else if (d1.toLocalDate.isEqual(d2.toLocalDate)) {
    (toEnd(d1.toLocalTime) - toEnd(d2.toLocalTime)).toString
  }
  else {
    (toEnd(d1.toLocalTime) + jourOuvree(d1.toLocalDate.plusDays(1), d2.toLocalDate.minusDays(1)) * 8.hours + toStart(d2.toLocalTime)).toString
  }
})

and you can call the udf function as

input_table.withColumn("toEquals", toEquals($"start_date",$"finish_date"))
Ramesh Maharjan
  • 41,071
  • 6
  • 69
  • 97
  • thank you for your answer. scala> input_os_historystep.withColumn("toEquals", toEquals($"start_date",$"finish_date")) java.lang.UnsupportedOperationException: Schema for type scala.concurrent.duration.FiniteDuration is not supported at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:756) at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:694) at org.apache.spark.sql.functions$.udf(functions.scala:3200) – vero Apr 04 '18 at 14:09
  • the error message is clear enough you will have to convert your FiniteDuration object to the supported types. [supported types](https://stackoverflow.com/a/49641733/5880706) You will have to return by converting the FiniteDuration to any of the types – Ramesh Maharjan Apr 04 '18 at 14:14
  • you mean I should convert it to LocalDateTime ? Because I have a class formatter that take a parameters FiniteDuration, I edited my question. Thank you very much – vero Apr 04 '18 at 14:24
  • Just change the object to String . I have updated my answer by using .toString. Please try that – Ramesh Maharjan Apr 04 '18 at 14:27
  • I'm sorry for this disturb, your solution is magnific, but when I do input_table.show() it do not add the new column toEquals, on the other hand it increment in the number of the column by +1 but it do not display it, also not in prinntSchema – vero Apr 05 '18 at 08:34
  • How can I display it. Thank you – vero Apr 05 '18 at 08:36
  • I asked a new question here https://stackoverflow.com/questions/49668142/withcolumn-display-a-new-column-datetime – vero Apr 05 '18 at 09:02
  • Yes, thank you. I asked another question here https://stackoverflow.com/questions/49693831/check-if-the-day-is-holiday-in-scala – vero Apr 06 '18 at 13:23
  • Can you give an answer for my question https://stackoverflow.com/questions/49804517/sparkcontext-in-sparksession/49804628#49804628 Thank you very much – vero Apr 12 '18 at 19:57