3

I have a relatively large table on an Amazon RDS production database (in the range of 2M records). I wish to group on a number of fields, including the month of a date (server_time) in the table. To speed things up a bit, I have created an index on the master database as follows:

create index on build_requests(group_id, artifact_id, account_id, number_of_interfaces, date_trunc('month', server_build_time));

Then, as you would expect, the query to group the data uses the index on the master:

GroupAggregate  (cost=0.55..311308.09 rows=1633231 width=85)
  Group Key: group_id, artifact_id, account_id, number_of_interfaces, date_trunc('month'::text, server_build_time)
  ->  Index Scan using build_requests_group_id_artifact_id_account_id_number_of_in_idx on build_requests  (cost=0.55..262417.68 rows=1898335 width=85)

However, after waiting for over an hour, the read replica still does not use the index:

GroupAggregate  (cost=434678.88..488313.41 rows=1633179 width=85)
  Group Key: group_id, artifact_id, account_id, number_of_interfaces, (date_trunc('month'::text, server_build_time))
  ->  Sort  (cost=434678.88..439424.56 rows=1898274 width=85)
        Sort Key: group_id, artifact_id, account_id, number_of_interfaces, (date_trunc('month'::text, server_build_time))
        ->  Seq Scan on build_requests  (cost=0.00..55053.43 rows=1898274 width=85)

Logging into the read replica with pgadmin, I see the index is present however. This is a problem, since the slow speed of the query on the read replica (5 minutes vs 3 seconds), causes other queries that include this query through postgres_fdw (cross database query) to return ssl connection resets (probably timeouts?).

Any idea why the read replica does not pick up/use the index I define on the master, and how I can remedy this? The query I execute on both master and read replica is identical:

SELECT group_id, artifact_id, 
       account_id, number_of_interfaces, 
       date_trunc('month', server_build_time) as server_build_month, 
       count(*)
FROM build_requests
GROUP BY group_id, artifact_id, 
         account_id, number_of_interfaces, 
         date_trunc('month', server_build_time);

Thanks for your help!

John Rotenstein
  • 241,921
  • 22
  • 380
  • 470
Yves V.
  • 775
  • 1
  • 7
  • 20
  • 1
    Did you try to VACUUM ANALYZE build_requests? – peter Nov 16 '17 at 15:57
  • No I didn't. I did now though, to no effect. You can't vacuum the read replica since it is readonly, and the master already worked, so I'm curious on why it would work, but regardless, running VACUUM ANALYZE build_requests on the master has no effect. – Yves V. Nov 16 '17 at 16:08
  • I am quite clueless and was just guessing. It seemed harmless to try. The planner strongly depends on a number of statistics (tuple-counts, index selectivity, etc.) Having wrong stats was the only cause I could imagine that would steer the planner off-course. I also imagined that the stats might not be part of the replicated set. Sorry. – peter Nov 16 '17 at 17:09

1 Answers1

5

The problem appeared to be with the types of the amazon RDS instances. The original RDS was a t2-medium, whereas the read replica was only a micro instance. After scaling the read replica to medium, the replica too, used the index.

Also, even when the read replica did use the index, simply executing the above query through postgres_fdw caused connection timeouts. It wasn't until I created a view on the master using the index, that the query executed without problem.

Yves V.
  • 775
  • 1
  • 7
  • 20