0

A Spark DataFrame has the .columns attribute:

dataFrame.columns

A DeltaTable does not. Note that the latter is based off a parquet file/directory and parquets are self-describing so the columnar info is available at the least in the files themselves. So the columnar info should be accessible/available from the DeltaTable. I just have not been able to find anything even by going deep into its protected/private attributes with a debugger. I wonder what is the way to work with these constructs?

Alex Ott
  • 80,552
  • 8
  • 87
  • 132
WestCoastProjects
  • 58,982
  • 91
  • 316
  • 560

2 Answers2

1

The instance of the DeltaTable object has the .toDF function (doc) that could be used to create a DataFrame instance on which you can call .columns.

tbl = DeltaTable.forPath(spark, "...")
tbl.toDF().columns

P.S. Although it would be nice to extend the .detail function to return the table's schema - maybe you can file a feature request for it.

Alex Ott
  • 80,552
  • 8
  • 87
  • 132
0

One of the way that I know is as below using sql syntax but you could also write that inside spark.sql.

Describe table extended tablename

Executing above command will give you all the details about the column name, data type, comments, physical location of parquet files, partitioning information if any and many more details.

Nikunj Kakadiya
  • 2,689
  • 2
  • 20
  • 35
  • I was asking about the api construct `DeltaTable` that is available in `scala` and `pyspark`. I think your suggestion would require creating a `DataFrame` and then doing something like `createOrReplaceTempView()` That's possible but is also not the intent of the question which is to avoid all that additional overhead. – WestCoastProjects Dec 11 '22 at 07:24