How does Vertica or other sql servers perform a join

Question

Look at my next Vertica query:

WITH groupA AS (
    SELECT
      userId,
      color
    FROM worldUsers
    WHERE DayPartition = 409
          AND Hour = 12
)
SELECT
  color = 12 AS                        userColor,
  REGEXP_LIKE(familyName, '[a-z]+ish', 'i') ishPeople,
  REGEXP_LIKE(familyName, '[a-z]+lem', 'i') lemPeople,
  count(*)                         AS                        peopleCount,
  round(count(*) / sum(count(*))
  OVER (), 5.0) * 100 || '%'       AS                        peoplePercentage

FROM PlanetUsers
  JOIN groupA USING (userId)
WHERE DayPartition = 409
GROUP BY 1, 2, 3
ORDER BY 4 DESC

Both the tables worldUsers and PlanetUsers got the field Hour and they both partitioned by DayPartition field. Will it improve the performance if we change the where statement of the below query

from

 WHERE DayPartition = 409

To

 WHERE DayPartition = 409 and AND Hour = 12

Or change the using statement

From

USING (userId)

To

USING (userId, Hour)

Assume that except the partition on DayPartition there are no primary keys and no projections

score 0 · Answer 1 · edited Apr 19 '17 at 05:45

The partition will be used only in the initial extract of your CTE. I suggest you eliminate the CTE since you have them and make sure you have both table ordered by the join keys (this dictates if you join will be hash or merge).

The partition will apply partition pruning and limit the amount of ROS used in the query.

So both of your tables should be ordered by (userid,hour).

Run explain on top of the initial sql , make sure you have stas on them , make changes in your query and see the explain again.

Post your plan here we might help you.

How does Vertica or other sql servers perform a join

1 Answers1