39

I need a query to find column names of a table (table metadata) in Bigquery, like the following query in SQL:

SELECT column_name,data_type,data_length,data_precision,nullable FROM all_tab_cols where table_name ='EMP';
Tiago Martins Peres
  • 14,289
  • 18
  • 86
  • 145
user1487985
  • 393
  • 1
  • 3
  • 6

6 Answers6

69

BigQuery now supports information schema.

Suppose you have a dataset named MY_PROJECT.MY_DATASET and a table named MY_TABLE, then you can run the following query:

SELECT column_name
FROM MY_PROJECT.MY_DATASET.INFORMATION_SCHEMA.COLUMNS
WHERE table_name = 'MY_TABLE'
Lak
  • 3,876
  • 20
  • 34
  • do you know why this query takes 3m+ to run on one of my datasets? There are not that many tables in the dataset and the tables are not large. But 3m+ makes it unusable in my application. Thanks for any help! – kilgoretrout Nov 11 '20 at 23:53
15

Yes you can get table metadata using INFORMATION_SCHEMA.

One of the examples mentioned in the past link retrieves metadata from the INFORMATION_SCHEMA.COLUMN_FIELD_PATHS view for the commits table in the github_repos dataset, you just have to

  1. Open the BigQuery web UI in the GCP Console.

  2. Enter the following standard SQL query in the Query editor box. INFORMATION_SCHEMA requires standard SQL syntax. Standard SQL is the default syntax in the GCP Console.

     SELECT
      *
     FROM
      `bigquery-public-data`.github_repos.INFORMATION_SCHEMA.COLUMN_FIELD_PATHS
     WHERE
      table_name="commits"
      AND column_name="author"
      OR column_name="difference"
    

Note: INFORMATION_SCHEMA view names are case-sensitive.

  1. Click Run.

The results should look like the following

  +------------+-------------+---------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------+-------------+
  | table_name | column_name |     field_path      |                                                                      data_type                                                                      | description |
  +------------+-------------+---------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------+-------------+
  | commits    | author      | author              | STRUCT<name STRING, email STRING, time_sec INT64, tz_offset INT64, date TIMESTAMP>                                                                  | NULL        |
  | commits    | author      | author.name         | STRING                                                                                                                                              | NULL        |
  | commits    | author      | author.email        | STRING                                                                                                                                              | NULL        |
  | commits    | author      | author.time_sec     | INT64                                                                                                                                               | NULL        |
  | commits    | author      | author.tz_offset    | INT64                                                                                                                                               | NULL        |
  | commits    | author      | author.date         | TIMESTAMP                                                                                                                                           | NULL        |
  | commits    | difference  | difference          | ARRAY<STRUCT<old_mode INT64, new_mode INT64, old_path STRING, new_path STRING, old_sha1 STRING, new_sha1 STRING, old_repo STRING, new_repo STRING>> | NULL        |
  | commits    | difference  | difference.old_mode | INT64                                                                                                                                               | NULL        |
  | commits    | difference  | difference.new_mode | INT64                                                                                                                                               | NULL        |
  | commits    | difference  | difference.old_path | STRING                                                                                                                                              | NULL        |
  | commits    | difference  | difference.new_path | STRING                                                                                                                                              | NULL        |
  | commits    | difference  | difference.old_sha1 | STRING                                                                                                                                              | NULL        |
  | commits    | difference  | difference.new_sha1 | STRING                                                                                                                                              | NULL        |
  | commits    | difference  | difference.old_repo | STRING                                                                                                                                              | NULL        |
  | commits    | difference  | difference.new_repo | STRING                                                                                                                                              | NULL        |
  +------------+-------------+---------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------+-------------+
Tiago Martins Peres
  • 14,289
  • 18
  • 86
  • 145
  • 1
    `INFORMATION_SCHEMA.COLUMN_FIELD_PATHS` is exactly what I needed. Thank you for this! – timle Jul 15 '21 at 21:37
4

Update: This is now possible! See the INFORMATION SCHEMA docs and the answers below.

Answer, circa 2012:

It's not currently possible to retrieve table metadata (i.e. column names and types) via a query, though this isn't the first time it's been requested.

Is there a reason you need to do this as a query? Table metadata is available via the tables API.

Craig Citro
  • 6,505
  • 1
  • 31
  • 28
  • Thanks Craig,tables API is enough for me to get the table metadata. – user1487985 Jul 05 '12 at 07:30
  • The reason that I personally hoped it could be done as a query is because that's how my project had been doing it before and it would be one less function to have to learn and convert in the process of moving everything over. – Altimus Prime Dec 04 '16 at 13:19
  • 1
    There's this [newer answer](https://stackoverflow.com/a/50978615/1308883) that could give you some ideas on how to do it through a Standard SQL query. – Jofre Jun 22 '18 at 00:16
  • this answer is wrong, at least currently. See below the right one. – Alexandru R Jun 16 '20 at 09:03
4

For newbies like me, the above is of the following syntax:

select * from project_name.dataset_name.INFORMATION_SCHEMA.COLUMNS where table_catalog=project_name and table_schema=dataset_name and table_name=table_name
Laura
  • 8,100
  • 4
  • 40
  • 50
SSS
  • 49
  • 1
  • I highly recommend you tag your question with 'SQL'. Also take a peek at the formatting of your query to make it a bit easier to read. – Marcel Wilson May 22 '19 at 04:08
2

Actually it is possible to do so using SQL. To do so you need to query the logging table for the last log of this particular table being created.

For example, assuming the table is loaded/created daily:

    CREATE TEMP FUNCTION jsonSchemaStringToArray(jsonSchema String)
          RETURNS ARRAY<STRING> AS ((
            SELECT
              SPLIT(
                REGEXP_REPLACE(REPLACE(LTRIM(jsonSchema,'{ '),'"fields": [',''), r'{[^{]+"name": "([^\"]+)"[^}]+}[, ]*', '\\1,')
              ,',')
          ));
    WITH valid_schema_columns AS (
      WITH array_output aS (SELECT
        jsonSchemaStringToArray(jsonSchema) AS column_names
      FROM (
        SELECT
          protoPayload.serviceData.jobInsertRequest.resource.jobConfiguration.load.schemaJson AS jsonSchema
          , ROW_NUMBER() OVER (ORDER BY metadata.timestamp DESC) AS record_count
        FROM `realself-main.bigquery_logging.cloudaudit_googleapis_com_data_access_20170101`
        WHERE
          protoPayload.serviceData.jobInsertRequest.resource.jobConfiguration.load.destinationTable.tableId = '<table_name>'
          AND
          protoPayload.serviceData.jobInsertRequest.resource.jobConfiguration.load.destinationTable.datasetId = '<schema_name>'
          AND
          protoPayload.serviceData.jobInsertRequest.resource.jobConfiguration.load.createDisposition = 'CREATE_IF_NEEDED'
      ) AS t
      WHERE
        t.record_count = 1 -- grab the latest entry
      )
      -- this is actually what UNNESTS the array into standard rows
      SELECT
        valid_column_name
      FROM array_output
      LEFT JOIN UNNEST(column_names) AS valid_column_name

    )
mr.meer
  • 111
  • 4
  • 1
    Where in this do we insert our table name? There's clearly a simpler version of this given the comment above refers to a much shorter SQL example. – Praxiteles Nov 23 '18 at 13:49
0

To Check column, You can access your table Through CLI Easy and simple to find

bq query --use_legacy_sql=false 'select Hour, sum(column 1) as column from `project_id.dataset.table_name` where Date(Hour) = '2020-06-10';'
β.εηοιτ.βε
  • 33,893
  • 13
  • 69
  • 83
Kumar Pankaj Dubey
  • 1,541
  • 3
  • 17
  • 17