It is possible to do it in one query by first getting a different id
for each different pdf_sha256
as a subquery, then in the query getting the elements within that set of ids by passing the subquery as follows:
def unique_disclosures_by_pdf_sha256(company)
subquery = company.disclosures.select('MIN(id) as id').group(:pdf_sha256)
company.disclosures.where(id: subquery)
.or(company.disclosures.where(pdf_sha256: nil))
end
The great thing about this is that ActiveRecord is lazy loaded, so the first subquery
will not be run and will be merged to the second main query to create a single query in the database. It will then retrieve all the disclosures
unique by pdf_sha256
plus all the ones that have pdf_sha256
set to nil
.
In case you are curious, given a company, the resulting query will be something like:
SELECT "disclosures".* FROM "disclosures"
WHERE (
"disclosures"."company_id" = $1 AND "disclosures"."id" IN (
SELECT MAX(id) as id FROM "disclosures" WHERE "disclosures"."company_id" = $2 GROUP BY "disclosures"."pdf_sha256"
)
OR "disclosures"."company_id" = $3 AND "disclosures"."pdf_sha256" IS NULL
)
The great thing about this solution is that the returned value is an ActiveRecord query, so it won't be loaded until you actually need. You can also use it to keep chaining queries. Example, you can select only the id
instead of the whole model and limit the number of results returned by the database:
unique_disclosures_by_pdf_sha256(company).select(:id).limit(10).each { |d| puts d }