How to get total row count and max(timestamp) column in select list for all table in particular schema

Question

We have Postgres based Read Only database. in that,we have 52 tables under one schema.

We are trying to output row count and max(timestamp) column for all tables under one schema.

Environment is :

PostgreSQL 8.2.15 (Greenplum Database 4.2.0 build 1) (HAWQ 1.3.0.2 build 14421) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled

We tried on :

SELECT 
  nspname AS schemaname,relname,reltuples,max(time)
FROM pg_class C
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE 
  nspname NOT IN ('pg_catalog', 'information_schema') AND
  relkind='r' 
ORDER BY reltuples DESC;

In this query, we got the row count column but still max(timestamp) for all table is not achieved.

Any help on it would be much appreciated ?

Hi vickps What time stamp format are are you using Eg time With or Without time zone ? — , May 26 '16 at 11:37

score 3 · Accepted Answer · answered May 26 '16 at 10:54

What you are accessing with this query is database statistics, which is not 100% accurate and might be missing or outdated depending on your statistics collection processes.

To get the row count for a list of tables, you have to scan each of these tables. However you can use pg_relation_size() to get an idea of the table size in bytes, and this function does not require you scanning the table.

If your table list is static, you can get away with a query like this:

select 'table1', count(*), max(time) from table1
union all
select 'table2', count(*), max(time) from table2
union all
...
select 'table52', count(*), max(time) from table52;

This solution is not flexible as if table list has changed, you need to rewrite your query.

Second option is to generate this query and manually execute it:

select string_agg(query, ' union all ') as query
    from (
        select 'select ''' || n.nspname || '.' || c.relname || ''', count(*), max(time) from ' || n.nspname || '.' || c.relname as query
            from pg_namespace as n, pg_class as c
            where n.oid = c.relnamespace
                and n.nspname = 'my_schema'
        ) as q;

This is more flexible, however the second query should be executed manually.

And finally your last option - writing a function for doing so:

create or replace function table_sizes (schemaname varchar) returns setof record as $BODY$
declare
    r record;
    t varchar;
begin
    for t in execute $$
        select n.nspname || '.' || c.relname
            from pg_namespace as n, pg_class as c
            where n.oid = c.relnamespace
                and c.relkind = 'r'
                and n.nspname = '$$ || schemaname || $$'$$
    loop
        execute 'select ''' || t || '''::varchar, count(*), max(time) from ' || t
            into r;
        return next r;
    end loop;
    return;
end;
$BODY$ language plpgsql volatile;

select * from table_sizes('public') t(tablename varchar, rowcount bigint, maxtime time);

when i am executing function here. getting following :: prod=# select * from table_sizes('schemaname') t(tablename varchar, rowcount bigint, maxtime time); ERROR: wrong record type supplied in RETURN NEXT CONTEXT: PL/pgSQL function "table_sizes" line 14 at return next prod=# — NEO, May 26 '16 at 12:39
this means that your "select" statement does not match data type returned. What is the data type of "time" field? It should match the one in "select" query — 0x0FFF, May 26 '16 at 12:41
got it . i changed it to timestamp . query executed. Thanks a ton . function is awesome. — NEO, May 26 '16 at 12:54
Also, for big tables I recommend to work with `pg_relation_type` instead of counting rows. Plus `max(time)` can be stored in "metadata" table that would be updated in the same transaction that updates main tables, this way there would be no need to manually calculate it — 0x0FFF, May 26 '16 at 12:57

score 1 · Answer 2 · edited Jan 25 '17 at 20:32

Here are some another steps to do:

In psql do below steps

\o count_per_schema.sql
select 'select count(*)as '||c.relname||', max(time) from ' || n.nspname || '.' || c.relname || ';' as " " from pg_namespace as n, pg_class as c where n.oid = c.relnamespace and c.relkind='r' and n.nspname = 'schema_name';
\o
\i count_per_schema.sql

\o will redirect result to filename you provided. For Example count_per_schema.sql and \i will run all queries from file.

Here is what I have done in server. I have not selected max(time).

yogesh=# \o count_per_schema.sql
yogesh=# select 'select count(*)as '||c.relname||' from ' || n.nspname || '.' || c.relname || ';' as " " from pg_namespace as n, pg_class as c where n.oid = c.relnamespace and c.relkind='r' and n.nspname = 'public';
yogesh=# \o
yogesh=# \i count_per_schema.sql

 heap1
-------
 20000
(1 row)
 test
-------
     4
(1 row)
 users
-------
     0
(1 row)
 skew_demo
-------
 10609
(1 row)

Advantage of this is you can delete entries for tables which you don't want from **output file** — pgyogesh, May 27 '16 at 22:07

How to get total row count and max(timestamp) column in select list for all table in particular schema

2 Answers2