I have some code snippet as below that is not accepted by Scala, it would be appreciated if someone can help to fix it, thanks. train_no_header is an RDD generated from a csv file, its first line is shown as below:
scala> train_no_header.first
res4: String = 87540,12,1,13,497,2017-11-07 09:30:38,,0
Now, I want to generate another RDD to parse and transform records with null or empty value for the 6th field which should be a DateTime (in the above sample the field is empty), some records might have that and some might not, for those having that, the format is same as 5th which is a UTC DateTime.
I need to calculate the delta between the two DateTime, I plan to convert them into Unixtime format, that being said, the final RDD should have both the two date fields converted into Unixtime format.
So my question is:
- with the sample data and format, how do I create the RDD with the needed result?
- for records with empty value in the 6th field, how should I handle it so that no exception would be generated in the future query in data frame (which is what I intend to work in)
Thank you very much in advance, any clue is appreciated.