-1

I am working on an application where records are in billions and I need to make a query where GroupBy clause is needed.

Table Schema:

  CREATE TABLE event (
  eventId INTEGER PRIMARY KEY,
  eventTime INTEGER NOT NULL,
  sourceId INTEGER NOT NULL,
  plateNumber VARCHAR(10) NOT NULL,
  plateCodeId INTEGER NOT NULL,
  plateCountryId INTEGER NOT NULL,
  plateStateId INTEGER NOT NULL
);


    CREATE TABLE source (
  sourceId INTEGER PRIMARY KEY,
  sourceName VARCHAR(32) NOT NULL
);

Scenario:

  1. User will select sources, suppose source ID (1,2,3)

  2. We need to get all events which occurred more than once for those source for event time range

  3. Same event criteria (same platenumber, platecodeId, platestateId, plateCountryId)

I have prepared a query to perform above mentioned operation but its taking long time to execute.

    select plateNumber, plateCodeId, plateStateId, 
    plateCountryId, sourceId,count(1) from event 
    where sourceId in (1,2,3) 
    group by sourceId, plateCodeId, plateStateId,
    plateCountryId, plateNumber
   having  count(1) > 1 limit 10 offset 0

Can you recommend optimized query for it?

Bilal Ahmed
  • 87
  • 1
  • 9

1 Answers1

1

Since you didn't supply the projection DDL, I'll assume the projection is default and created by the CREATE TABLE statement

Your goal is to achieve the use of the GROUPBY PIPELINED algorithm instead of GROUPBY HASH which is usually slower and consumes more memory. To do so, you need the table('s projection) to be sorted by the columns in the group by clause.

More info here: GROUP BY Implementation Options

CREATE TABLE event (
  eventId INTEGER PRIMARY KEY,
  eventTime INTEGER NOT NULL,
  sourceId INTEGER NOT NULL,
  plateNumber VARCHAR(10) NOT NULL,
  plateCodeId INTEGER NOT NULL,
  plateCountryId INTEGER NOT NULL,
  plateStateId INTEGER NOT NULL
)
ORDER BY sourceId,
         plateCodeId,
         plateStateId,
         plateCountryId,
         plateNumber;

You can see which algorithm is being used by adding EXPLAIN before your query.

ya24
  • 490
  • 1
  • 4
  • 16