2

I am trying to optimize a query. Using EXPLAIN tells me it is Using temporary. This is really inefficient given the size of the table (20m+ records). Looking at the MySQL documentation Internal Temporary Tables I don't see anything that would imply the need for a Temporary table in my query. I also tried setting the ORDER BY to the same as the GROUP BY, but still says Using Temporary and query takes forever to run. I am using MySQL 5.7.

Is there a way to avoid using a temporary table for this query:

SELECT url,count(*) as sum 
FROM `digital_pageviews` as `dp` 
WHERE `publisher_uuid` = '8b83120e-3e19-4c34-8556-7b710bd7b812' 
GROUP BY url 
ORDER BY NULL;

This is my table schema:

create table digital_pageviews
(
  id             int unsigned auto_increment
    primary key,
  visitor_uuid   char(36)            null,
  publisher_uuid char(36) default '' not null,
  property_uuid  char(36)            null,
  ip_address     char(15)            not null,
  referrer       text                null,
  url_delete     text                null,
  url            varchar(255)        null,
  url_tmp        varchar(255)        null,
  meta           text                null,
  date_created   timestamp           not null,
  date_updated   timestamp           null
)
  collate = utf8_unicode_ci;

create index digital_pageviews_url_index
  on digital_pageviews (url);

create index ndx_date_created
  on digital_pageviews (date_created);

create index ndx_property_uuid
  on digital_pageviews (property_uuid);

create index ndx_publisher_uuid
  on digital_pageviews (publisher_uuid);

create index ndx_visitor_uuid_page
  on digital_pageviews (visitor_uuid);

newms87
  • 834
  • 1
  • 10
  • 23

1 Answers1

5

The reason it needs a temporary table is that it cannot both filter by publisher_uuid and sort on a column without an index to do both. The first step is to filter by publisher_uuid, so it uses the index on publisher_uuid.

However, next it has to group by and order the records, which will require a temporary table because it cannot use an index which will do this. The reason it cannot use an index is that it already used the publisher_uuid, which is not indexed on the url field to do the group by or on the field you are ordering by.

To filter where publisher_uuid = '8b83120e-3e19-4c34-8556-7b710bd7b812', group by url, and order by url, create an index with these fields in this order:

  • publisher_uuid
  • url
create index ndx_publisher_uuid
  on digital_pageviews (publisher_uuid, url);
Salman A
  • 262,204
  • 82
  • 430
  • 521
Marlin Pierce
  • 9,931
  • 4
  • 30
  • 52