17

I am trying to check if a table exists in hive metastore if not, create the table. And if the table exists, append data.

I have a snippet of the code below:

spark.catalog.setCurrentDatabase("db_name")
db_catalog = spark.catalog.listTables(dbName = 'table_name)
if any(table_name in row for row in db_catalog):
    add data
else:
    create table

However, I am getting an error.

>>> ValueError: Some of types cannot be determined after inferring

I am unable to resolve the value error as I get the same errors for other databases' tables created in hive metastore. Is there another way to check if table exists in hive metastore?

Matt Ke
  • 3,599
  • 12
  • 30
  • 49
Cryssie
  • 3,047
  • 10
  • 54
  • 81

9 Answers9

26

You can use JVM object for this.

if spark._jsparkSession.catalog().tableExists('db_name', 'tableName'):
  print("exist")
else:
  print("Not Exist")

Py4j socket used for Python functionality. Others operation uses JVM SparkContext.

In Spark Scala you can directly access.

spark.catalog.tableExists("dbName.tableName")
Matt Ke
  • 3,599
  • 12
  • 30
  • 49
Yukeshkumar
  • 462
  • 4
  • 12
6

In pyspark 2.4.0 you can use one of the two approaches to check if a table exists. Keep in mind that the Spark Session (spark) is already created.

table_name = 'table_name'
db_name = None
  1. Creating SQL Context from Spark Session's Context
from pyspark.sql import SQLContext

sqlContext = SQLContext(spark.sparkContext)
table_names_in_db = sqlContext.tableNames(db_name)

table_exists = table_name in table_names_in_db
  1. Using Spark Session Catalog
tables_collection = spark.catalog.listTables(db_name)
table_names_in_db = [table.name for table in tables_collection]

table_exists = table_name in table_names_in_db

I would use the first approach because the second seems to trigger spark job, so it is slower.

In order to write or append a table you might use the following methods

df.write.mode('append').saveAsTable(table_name)
Grzegorz
  • 1,268
  • 11
  • 11
6

As of 3.3.0: https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.Catalog.tableExists.html

spark.catalog.tableExists("my_table")
Brian
  • 848
  • 10
  • 32
5

try below:

table_list=spark.sql("""show tables in your_db""")
table_name=table_list.filter(table_list.tableName=="your_table").collect()
if len(table_name)>0:
    print("table found")
else:
    print("table not found")
vikrant rana
  • 4,509
  • 6
  • 32
  • 72
4

The way I recommend is:

def check_table_exist(db_tbl_name):
    table_exist = False
    try:
        spark.read.table(db_tbl_name) # Check if spark can read the table
        table_exist = True        
    except:
        pass
    return table_exist 

check_table_exist(db_tbl_name="foodb.bar_tbl")

and use this bool flag in your script!

Eric Aya
  • 69,473
  • 35
  • 181
  • 253
Jatin Chauhan
  • 327
  • 1
  • 2
  • 10
3

Another suggestion avoiding to create a list-like structure:

if (spark.sql("show tables in <DATABASE>")
         .filter(col("tableName") == "<TABLENAME>")
         .count() > 0):
    print('found')   
else:
    print('not found')
Marcel Flygare
  • 837
  • 10
  • 19
3

Combining the best of two answers:

tblList = sqlContext.tableNames("db_name")
if table_name in tblList:
    print("Table exists")
    # insert code to insert/append
else:
    print("Table does not Exist")
    # insert code to create

Hope that helps.

Brent

0

We have used the following in databricks to check if a table exists, this should work I guess.

tblList = sqlContext.tableNames(<your db name>)
for tbl in tblList:
  if tbl == <table_name>:
    print("Table exists")
  else:
    print("Table doesnot exist, create the table")

Thanks

Yuva

Yuva
  • 2,831
  • 7
  • 36
  • 60
0

To check table exists in Databricks hive metastore using Pyspark. Use below code:

if spark.catalog._jcatalog.tableExists(f"{database_name}.{table_name}"):
  print("Table exists")
else:
  print("Table does not exists")
KaranSingh
  • 460
  • 4
  • 11