0

Can someone let me know to convert the following from a Dataframe to an InterType()

df = DeltaTable.forPath(spark, '/mnt/lake/BASE/SQLClassification/cdcTest/dbo/cdcmergetest/1').history().select(max(col("version")).alias("version"))

I have tried the following

result = df.collect()[0]

result2  = df.withColumn("version",df.version.cast('integer'))

But with no luck

Any thoughts?

Jacek Laskowski
  • 72,696
  • 27
  • 242
  • 420
Patterson
  • 1,927
  • 1
  • 19
  • 56

3 Answers3

4
from delta.tables import DeltaTable
import pyspark.sql.functions

dt = DeltaTable.forPath(spark, '/mnt/lake/BASE/SQLClassification/cdcTest/dbo/cdcmergetest/1')
latest_version = int(dt.history().select(max(col("version"))).collect()[0][0])
Jacek Laskowski
  • 72,696
  • 27
  • 242
  • 420
1

If table already exist:

dftemp = spark.sql("DESCRIBE history table1 limit 1").collect()
verno = int(dftemp[0][0])
Laki
  • 76
  • 1
  • 1
  • 9
0

I figured it out.

I needed to add collect() to the dataframe.

df = DeltaTable.forPath(spark, '/mnt/lake/BASE/SQLClassification/cdcTest/dbo/cdcmergetest/1').history().select(max(col("version")).alias("version")).collect()

And then simply strip out the 0

df = df[0][0]
Patterson
  • 1,927
  • 1
  • 19
  • 56