I'm trying to recreate some work I have already done in Python using Databricks. I have a dataframe and within it is a column called 'time', of data in nanoseconds. In Python, I use the following code to convert the field into the appropriate datetime values:
import pandas as pd
# Convert time field from nanoseconds into datetime
df["time"] = pd.to_datetime(df["time"], unit='ns')
This code converts the following value 1642778070000000000 into 2022-01-21 15:14:30. I now want to do this in databricks using pyspark (as I'm scaling up the problem and the dataset I'm using is too large to do in Python). I've created a spark dataframe called df and then imported the pyspark.pandas functions and then tried effectively the same code, but it doesn't work:
from pyspark import pandas as ps
df = df.ps.to_datetime(df.columns[2], unit='ns') #the time column is in column index 2
I get an error:
'DataFrame' object has no attribute 'ps'
Any suggestions?
Any suggestions?