9

Before all thank you for your help!

I want to find out which tables in the database are most heavily used, i.e. the amount of users that query the table, the amount of times it was queried, the resources that where consumed by users per table, the total time the tables where queried, and any other useful data. For now I would limit the analysis to 9 specific tables. I'd tried using stl_scan and pg_user using the next two querys:

SELECT
    s.perm_table_name           AS table_name,
    count(*)                    AS qty_query,
    count(DISTINCT s.userid)    AS qty_users
FROM stl_scan s
JOIN pg_user b
    ON s.userid = b.usesysid
JOIN temp_mone_tables tmt
    ON tmt.table_id = s.tbl AND tmt.table = s.perm_table_name
WHERE s.userid > 1
GROUP BY 1
ORDER BY 1;

SELECT
    b.usename                                       AS user_name,
    count(*)                                        AS qty_scans,
    count(DISTINCT s.tbl)                           AS qty_tables,
    count(DISTINCT trunc(starttime))                AS qty_days
FROM stl_scan s
JOIN pg_user b
    ON s.userid = b.usesysid
JOIN temp_mone_tables tmt
    ON tmt.table_id = s.tbl AND tmt.table = s.perm_table_name
WHERE s.userid > 1
GROUP BY 1
ORDER BY 1;

The temp_mone_tables is a temporal table that contains the id and name of the tables I'm interested.

With this queries I'm able to get some information but I need more details. Surprisingly there's not much data online about this kind of statistics.

Again thank you all beforehand!

Jacob H
  • 2,455
  • 1
  • 12
  • 29
Nambu14
  • 380
  • 1
  • 7
  • 20
  • 1
    Not that familiar with Redshift, but does this help? https://docs.aws.amazon.com/redshift/latest/dg/r_SVV_TABLE_INFO.html – Jacob H May 04 '18 at 15:24
  • Thank you for the help Jacob H. Unfortunately that view has information about the table itself and not about the usage of it. Anyway, that information is also helpful! – Nambu14 May 04 '18 at 15:37
  • https://stackoverflow.com/questions/67543337/how-to-find-the-most-accessed-table-in-redshift?noredirect=1#comment119388436_67543337 – Alok Kumar Singh May 15 '21 at 10:15

2 Answers2

10

Nice work! You are on the right track using the stl_scan table. I'm not clear what further details you're looking for.

For detailed metrics on resource usage you may want to use the SVL_QUERY_METRICS_SUMMARY view. Note that this data is summarized by query not table because a query is the primary way resources are utilized.

Generally, have a look at the admin queries (and views) in our Redshift Utils library on GitHub, particularly v_get_tbl_scan_frequency.sql

Joe Harris
  • 13,671
  • 4
  • 47
  • 54
2

Thanks to Joe Harris' answer I was able to add a lot of information to my previous query. With svl_query_metrics_summary joined to stl_scan you get important data about resources consumption, this information can be extended joining them to the vast number of views listed in Joe's answer.

For me the solution begins with the next query:

SELECT *
FROM stl_scan ss
JOIN pg_user pu
    ON ss.userid = pu.usesysid
JOIN svl_query_metrics_summary sqms
    ON ss.query = sqms.query
JOIN temp_mone_tables tmt
    ON tmt.table_id = ss.tbl AND tmt.table = ss.perm_table_name

The query gives you a lot of data that can be summarized in multiple ways as wanted.

Remember that temp_mone_tables is a temp table that contains the tableid and name of the tables I'm interested.

Nambu14
  • 380
  • 1
  • 7
  • 20