1

I have some code snippet as below that is not accepted by Scala, it would be appreciated if someone can help to fix it, thanks. train_no_header is an RDD generated from a csv file, its first line is shown as below:

scala> train_no_header.first
res4: String = 87540,12,1,13,497,2017-11-07 09:30:38,,0

Now, I want to generate another RDD to parse and transform records with null or empty value for the 6th field which should be a DateTime (in the above sample the field is empty), some records might have that and some might not, for those having that, the format is same as 5th which is a UTC DateTime.

I need to calculate the delta between the two DateTime, I plan to convert them into Unixtime format, that being said, the final RDD should have both the two date fields converted into Unixtime format.

So my question is:

  1. with the sample data and format, how do I create the RDD with the needed result?
  2. for records with empty value in the 6th field, how should I handle it so that no exception would be generated in the future query in data frame (which is what I intend to work in)

Thank you very much in advance, any clue is appreciated.

Choix
  • 555
  • 1
  • 12
  • 28

0 Answers0