1

Is there any way by which we can implement information_schema using Hive? Also, What is the process to do data profiling in Hive?
Example in SQL we use Information_Schema like this :

SELECT * FROM INFORMATION_SCHEMA.TABLES;
user4157124
  • 2,809
  • 13
  • 27
  • 42
rachel
  • 225
  • 2
  • 5
  • 16
  • In [Big SQL 1.0 - Catalog schema](http://www-01.ibm.com/support/knowledgecenter/SSPT3X_3.0.0/com.ibm.swg.im.infosphere.biginsights.bigsql.doc/doc/bsql_catalog_schema.html) can give you some information ;). – shA.t Apr 20 '15 at 09:25

2 Answers2

2

information_schema is not implemented in Hive, there is an open JIRA for the same. By default Hive uses Derby as the metastore to store the table metadata. By connecting to Derby, the system tables can be seen.

Praveen Sripati
  • 32,799
  • 16
  • 80
  • 117
  • 1
    Thanks Praveen. Do you know how to connect to Derby? – rachel Dec 19 '13 at 19:03
  • `jdbc:derby:;databaseName=metastore_db;create=true` is the connect string, `org.apache.derby.jdbc.EmbeddedDriver` is the driver name. By default metastore_db folder will be created in the Hive installation directory where the metadata is stored. Write a Java program or use one of the GUI tools mentioned [here](http://wiki.apache.org/db-derby/UsesOfDerby) – Praveen Sripati Dec 19 '13 at 19:27
  • 1
    This is a decent workaround. I note that in May 2017 that JIRA ticket appears to be resolved, and tagged to be released in v 3.0.0. It involves running the schematool to create the sys and information_schema databases. Currently it appears latest is 2.3 – Davos Sep 20 '17 at 05:13
0
import pyodbc
import pandas as pd
cnx = pyodbc.connect(dsn = 'Hive', autocommit ='true' )
df = pd.read_sql('DESCRIBE EXTENDED TABLE_NAME;', cnx)
print(df)

i use the above code in python but to answer your question you could simply use DESCRIBE EXTENDED TABLE_NAME;

Muthu
  • 41
  • 1
  • 8