0

I have the following query in my system:

SELECT
    DISTINCT ON(consultant_profiles.id)
    consultant_profiles.id,
    (
      array_remove(array_agg(DISTINCT(function_experiences.experience)), NULL) ||
      array_remove(array_agg(DISTINCT(sellable_project_function_experiences.experience)), NULL)
    ) AS function_experiences

FROM
    consultant_profiles

LEFT OUTER JOIN
    consultant_backgrounds ON consultant_backgrounds.consultant_profile_id = consultant_profiles.id

LEFT OUTER JOIN
    function_experiences ON function_experiences.consultant_background_id = consultant_backgrounds.id

LEFT OUTER JOIN
    sellable_projects ON sellable_projects.consultant_profile_id = consultant_profiles.id

LEFT OUTER JOIN
    sellable_project_function_experiences ON sellable_project_function_experiences.sellable_project_id = sellable_projects.id

GROUP BY
    consultant_profiles.id;

The problem is that function_experiences column can have a duplicated values. I tried to use DISTINCT method but it is not working.

Is there any way of doing that in PostgreSQL?

Mateusz Urbański
  • 7,352
  • 15
  • 68
  • 133

1 Answers1

0

Because the original question lacked both sample data and expected results, I've not fully tested the following query, but it should produce the requested results:

SELECT consultant_profiles.id,
       (SELECT array_agg(DISTINCT t.experience ORDER BY t.experience)
         FROM UNNEST(array_agg(DISTINCT function_experiences.experience)
                       FILTER (WHERE function_experiences.experience IS NOT NULL) ||
                    array_agg(DISTINCT sellable_project_function_experiences.experience)
                       FILTER (WHERE sellable_project_function_experiences IS NOT NULL)) t(experience)) AS function_experiences
FROM
  consultant_profiles
  LEFT OUTER JOIN consultant_backgrounds ON consultant_backgrounds.consultant_profile_id = consultant_profiles.id
  LEFT OUTER JOIN function_experiences ON function_experiences.consultant_background_id = consultant_backgrounds.id
  LEFT OUTER JOIN sellable_projects ON sellable_projects.consultant_profile_id = consultant_profiles.id
  LEFT OUTER JOIN sellable_project_function_experiences ON sellable_project_function_experiences.sellable_project_id = sellable_projects.id
GROUP BY
  consultant_profiles.id;

In addition to using a subquery to eliminate duplicate experiences from the arrays, I've also made three other changes:

  1. array_remove is replaced by a FILTER expression. This change ignores NULL elements as the array is being built instead of making an additional pass through the built array.
  2. DISTINCT ON(consultant_profiles.id) is removed because it is redundant when GROUP BY consultant_profiles.id is present.
  3. ORDER BY t.experience is included when building function_experiences. Sorting the experiences isn't required and there is a slight performance impact, but having the experiences in a predictable order makes it easier for humans to compare between rows and can also aid testing.
JohnH
  • 2,001
  • 1
  • 2
  • 13