I have a dataframe as such
test_df1 = spark.createDataFrame(
[
(1, "-", "-"),
(1, "97", "00:00:00.02"),
(1, "78", "00:00:00.02"),
(2, "83", "00:00:00.02"),
(2, "14", "00:00:00.02"),
(2, "115", "00:00:00.02"),
],
['ID', 'random', 'time']
)
test_df1.show()
+---+------+-----------+
| ID|random| time |
+---+------+-----------+
| 1| -| -|
| 1| 97|00:00:00.02|
| 1| 78|00:00:00.02|
| 2| 83|00:00:00.02|
| 2| 14|00:00:00.02|
| 2| 115|00:00:00.02|
+---+------+-----------+
How can I convert the time
column to milliseconds in doubletype? I am currently doing it as stated below where I get the numbers after seconds as string and then cast it as double. Is there better ways?
test_df2 = test_df1.withColumn("time", F.substring_index("time", '.', -1).cast("double"))
test_df2.show()
+---+------+----+
| ID|random|time|
+---+------+----+
| 1| null|null|
| 1| 97.0| 2.0|
| 1| 78.0| 2.0|
| 2| 83.0| 2.0|
| 2| 14.0| 2.0|
| 2| 115.0| 2.0|
+---+------+----+