1

I have tables in Azure Databricks that I am using SQL to interact with via a notebook. I need to select all columns from a table with 200 columns, I need to select all of them but I need to modify some for a select insert (To modify specific columns for a PK). Therefore I can not use a select *. (There are multiple scenarios this is just my current objective)

How can I generate a select statement on a table with all the column names in a sql statement. This would be equivalent of a 'Select top N' in SSMS where it generates a select for the table I can than edit.

I have seen functions like describe and show but they can't build a select statement. I am new to Databricks. Any help is appreciated.

GLowe
  • 21
  • 5
  • Is this a one time thing or you need it automated? – dfundako Aug 20 '21 at 14:37
  • I would use this all the time. I use this ability in SSMS multiple times a day. – GLowe Aug 20 '21 at 14:41
  • how do you want to use it? Run some function that will print SQL for you? – Alex Ott Aug 21 '21 at 11:38
  • That would be my thought. Some function/script that would dynamically build the select statement. I am new to Databricks and their 'database' structure. I am not sure if there are metadata views I could query in sql. I am not overly familiar with python. – GLowe Aug 23 '21 at 10:43
  • If it helps. There is a 'Show Create Table' option for databricks sql. I would be looking for some type of 'Show select table' or something. Not my question but an 'Show Update Table' would also be great. Perhaps I need to feature request this from them. https://docs.databricks.com/sql/language-manual/sql-ref-syntax-aux-show-create-table.html – GLowe Aug 24 '21 at 17:19

2 Answers2

0

I have the same problem. It is really tough to make and modify SELECT statement for this kind of tables. I have tried many ways and found using the 3rd party software to connect to the table on Azure Databricks worked fine.

Here is what I do:

  1. Download the 3rd party software such as DBeaver
  2. Download Databricks JDBC driver form this page.
  3. Configure Databricks driver. Luckily there is an official doc for DBeaver.
  4. Connect to the Databricks and find the table to generate SELECT statement.
  5. Use DBeaver built-in function to generate it. See the screenshot below.

That's it!

I found this setup took just 10-15 minutes to complete saving much time.

enter image description here

0

Generate a SELECT statement from an existing table:

def generate_select(schema_name, table_name):
    df = spark.sql(f'show columns in {schema_name}.{table_name}').collect()
    print('SELECT')
    print('  ' + ',\n  '.join([row[0] for row in df]))
    print(f'FROM {schema_name}.{table_name}')

generate_select('tpcds001tb', 'customer')

Output:

SELECT
  c_customer_sk,
  c_customer_id,
  c_current_cdemo_sk,
  c_current_hdemo_sk,
  c_current_addr_sk,
  c_first_shipto_date_sk,
  c_first_sales_date_sk,
  c_salutation,
  c_first_name,
  c_last_name,
  c_preferred_cust_flag,
  c_birth_day,
  c_birth_month,
  c_birth_year,
  c_birth_country,
  c_login,
  c_email_address,
  c_last_review_date
FROM tpcds001tb.customer
Chris Koester
  • 492
  • 4
  • 9