0

I was trying to convert a string column in my dataframe into date type. The string looks like this :

Fri Oct 12 18:14:29 +0000 2018

And I have tried this code

df_en.withColumn('date_timestamp',unix_timestamp('created_at','ddd MMM dd HH:mm:ss K yyyy')).show()

But I got the result of :

+--------------------+--------------------+--------------------+--------------+
|          created_at|                text|           sentiment|date_timestamp|
+--------------------+--------------------+--------------------+--------------+
|Mon Oct 15 20:53:...|What a shock hey,...|-0.07755102040816327|          null|
|Fri Oct 12 18:14:...|No Bucky, people ...|                 0.0|          null|
|Wed Oct 10 07:51:...|If Sarah Hanson Y...|                0.05|          null|
|Mon Oct 15 02:30:...|            365 days|                 0.0|          null|
|Sun Oct 14 06:17:...|#HimToo: how an a...|                -0.5|          null|
|Tue Oct 09 07:30:...|hopefully the #Hi...|                 0.0|          null|
|Tue Oct 09 23:30:...|If Labor win Gove...|                 0.8|          null|
|Thu Oct 11 01:09:...|Hello #Perth - th...|                0.75|          null|
|Sat Oct 13 21:47:...|#MeToo changed th...|                 0.0|          null|
|Tue Oct 09 00:41:...|Rich for Queensla...|               0.375|          null|
|Mon Oct 15 12:59:...|Wonder what else ...|                 0.0|          null|
|Mon Oct 15 05:12:...|@dani_ries #metoo...|                 0.0|          null|
|Wed Oct 10 00:30:...|Hey @JackieTrad a...|                0.25|          null|
|Tue Oct 16 04:00:...|“There's this ide...| 0.03611111111111113|          null|
|Sun Oct 14 08:14:...|Is this the attit...|-0.01499999999999999|          null|
|Sat Oct 13 11:26:...|#metoo official s...|                 0.1|          null|
|Tue Oct 09 00:23:...|On the limited an...|-0.01904761904761...|          null|
|Tue Oct 16 14:41:...|Domestic Violence...|                 0.0|          null|
|Wed Oct 10 23:34:...|@australian Note ...|                 0.0|          null|
|Sat Oct 06 20:07:...|Wtaf, America. I ...|                 0.0|          null|
+--------------------+--------------------+--------------------+--------------+

Also, I have tried

df_en.select(col("created_at"),to_date(col("created_at")).alias("to_date") ).show()

The result is exactly the same. I don't know why, could anybody help me ?

1 Answers1

0

Try this pattern EEE MMM dd HH:mm:ss Z yyyy with Spark config .config('spark.sql.legacy.timeParserPolicy', 'LEGACY'). Check this as well.

pltc
  • 5,836
  • 1
  • 13
  • 31