0

I have a simple schema with a date and an int. I want to use date_add to add the int to the date.

scala> val ds1 = spark.read.option("inferSchema",true).csv("samp.csv")

ds1.printSchema();

root
 |-- _c0: timestamp (nullable = true)
 |-- _c1: integer (nullable = true)

I cannot get the first param to date_add work...please help!

scala> val ds2 = ds1.map ( x => date_add(x.getAs[timestamp]("_c0"),  x.getAs[Int]("_c1")))
<console>:28: error: not found: type timestamp

scala> val ds2 = ds1.map ( x => date_add(x.getAs[Column]("_c0"), x.getAs[Int] ("_c1")))
<console>:28: error: not found: type Column
coder AJ
  • 1
  • 4

1 Answers1

1

date_add is not your immediate problem... not found: type {timestamp, Column}

I'm not sure how you expect x.getAs[timestamp] to work, honestly, but for the other, you need an import.

import org.apache.spark.sql.Column
import org.apache.spark.sql.functions.date_add

Now try

val ds2 = ds1.map { x => 
    date_add(ds1("_c0"), x.getAs[Int] ("_c1"))
}

(Though, you should ideally not be using Dataset.map)

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
  • Thanks so much! Yeah i was so frustrated, i was trying everythingscala> val ds2 = ds1.map ( x => date_add(ds1.col("_c0"), x.getAs[Int]("_c1"))) :30: error: Unable to find encoder for type stored in a Dataset. Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._ Support for serializing other types will be added in future releases. val ds2 = ds1.map ( x => date_add(ds1.col("_c0"), x.getAs[Int]("_c1"))) ^ – coder AJ Feb 11 '17 at 16:40
  • And when i use foreach i get a null pointer exception org.apache.spark.sql.Dataset$$anonfun$foreach$1.apply(Dataset.scala:2286) at org.apache.spark.sql.Dataset$$anonfun$foreach$1.apply(Dataset.scala:2286) at at org.apache.spark.sql.Dataset.foreach(Dataset.scala:2285) ... 48 elided Caused by: java.lang.NullPointerException at org.apache.spark.sql.Dataset.resolve(Dataset.scala:217) – coder AJ Feb 11 '17 at 16:45
  • New to spark and all this so clueless how to proceed. my ds has values. +--------------------+---+ | _c0|_c1| +--------------------+---+ |2017-01-01 00:00:...| 10| |2017-02-01 00:00:...| 28| +--------------------+---+ – coder AJ Feb 11 '17 at 16:46
  • Like I said, ideally not be using Dataset.map (especially not foreach). You could try `ds1.col("_c1").as[Int]`, or something like that – OneCricketeer Feb 11 '17 at 16:56
  • The `col(String name)` function is correct. The second parameter needs to be an integer, not a column. Or, you could try raw SQL, not mess with the dataset methods – OneCricketeer Feb 11 '17 at 19:12