I am saving a spark 1.6 below DF into phoenix table, the problem I am facing is "With Column ("create_ts", current_timestamp ())" insert same timestamp for the entire DF. Please see below example.
I want to have unique timestamp in milliseconds for each row for each job, because of this issue lot of data have been overridden due to same composite key.
Sample data:
+------------------------------------------+------------------------------------------+
| JOB_NAME | CREATE_TS |
+------------------------------------------+------------------------------------------+
|ETL_JOB_application_1500036106103_27268 | 2017-08-03 06:18:31.593 |
|ETL_JOB_application_1500036106103_27268 | 2017-08-03 06:18:31.593 |
|ETL_JOB_application_1500036106103_27268 | 2017-08-03 06:18:31.593 |
|ETL_JOB_application_1500036106103_27266 | 2017-08-03 06:16:39.243 |
|ETL_JOB_application_1500036106103_27266 | 2017-08-03 06:16:39.243 |
|ETL_JOB_application_1500036106103_27266 | 2017-08-03 06:16:39.243 |
|ETL_JOB_application_1500036106103_27266 | 2017-08-03 06:16:39.243 |
|ETL_JOB_application_1500036106103_27266 | 2017-08-03 06:16:39.243 |
|ETL_JOB_application_1500036106103_27266 | 2017-08-03 06:16:39.243 |
|ETL_JOB_application_1500036106103_27266 | 2017-08-03 06:16:39.243 |
+------------------------------------------+------------------------------------------+
Code:
stagedDataFrame
.select($"RemoteID", $"TagName", $"TagValueTs", $"Value", $"TagTypeName")
.withColumn("job_name", lit(s"${etlStatistics2.sqlContext.sparkContext.appName}_${etlStatistics2.sqlContext.sparkContext.applicationId}"))
.withColumn("create_ts", current_timestamp())
.withColumn("record_count", lit(etlStatistics2.head().getLong(3)))
.select($"job_name", $"create_ts",$"record_count",$"RemoteID" as "remoteid", $"TagName" as "tagname", $"TagValueTs" as "tagvalue_ts", $"Value" as "value", $"TagTypeName" as "tagtypename")