4

Is there a way in athena/presto to get column names based on a condition and use it in group by?

SHOW COLUMNS FROM {table_name}

This query gives me the column details. But I want to use only the column names in some other query. The structure of the table looks like

name_of_service | cost | usage_date | user_123 | user_212 | user_342 | user_5832 | ...

And so on. There are around 500 columns

The use case I am looking at, looks somewhat like this -

SELECT SUM(cost), {column_names_which_start_with_user}
FROM {db}
GROUP BY cost, {column_names_which_start_with_user}

I am not in control of populating the db and there are 100s of columns whose names change every month. Is there a way to do this?

  • 2
    `there are 100s of columns whose names change every month` ... this, combined with that your tables have 100s of columns to begin with, makes me feel that perhaps you don't have optimal database design. You might want to add some description on how these tables came about. – Tim Biegeleisen Sep 29 '20 at 06:40

1 Answers1

11

There's a virtual database called information_schema which can be queried for metadata about tables and columns:

SELECT column_name
FROM information_schema.columns
WHERE table_name = 'my_table'
AND column_name LIKE 'user_%'

Unfortunately you can't use this to generate the list of columns of a query, because queries can't be dynamic like that. However, you can use it in your client code to generate the query you are after.

Theo
  • 131,503
  • 21
  • 160
  • 205
  • To me, it would be more useful w/o the table_name specified. It doesn't appear to work with just column_name by itself, but one can add: (table_catalog = 'OurCatalog') AND (table_schema = 'OurSchema') – JosephDoggie Mar 24 '21 at 20:35