0

While executing show create table in hive, we have location url in the statement for both internal and external table. But when executing the same statement through spark (pyspark) we are only having location if the table is external. Is there any settings to have this location for internal table as well?

I want to collect the create statements with location for all tables from onprem to databricks. Databricks also displays create statements without location for internal tables.

We can get from location from desc table. But I would like to know if there is tweak to have it in show create table.

Example Hive:

show create table db1.table1

In pyspark:

createStatement = spark.sql("show create table db1.table1")

createStatement.show()
Azhar Khan
  • 3,829
  • 11
  • 26
  • 32
CuriousB
  • 1
  • 1

1 Answers1

0

Here's a related answer for scala spark.

The idea would be the same for you. You could query the (likley derby) database that is used to hold the information for all spark tables.(Spark Metastore).

You also should know that by default databases/tables that are internal follow this format: (database is the name of the database and tablename the name of the table.

hdfs:/hive/warehouse/directory/[database].db/[tablename]
Matt Andruff
  • 4,974
  • 1
  • 5
  • 21
  • Thanks Matt for responding. I agree that the path can be derived either through concatenation of warehouse directory + database + tablename or through describe formatted command. But I am collecting DDLs for 5000+ tables under same database into a hive table. I will be using spark script to execute this DDL in Azure as External tables. If the location is available in the create statement, I could be easier to manipulate. Hence I was looking for any configuration which could help me with it. – CuriousB Nov 24 '22 at 09:09
  • I would suggest working with the spark metastore then to pull the information that you need. This is what describe uses to generate the information. – Matt Andruff Nov 24 '22 at 14:02