-1

I have in SSIS three columns from a database.

Price Number Title

I want to GROUP BY by "Price" and "Number". The Problem is, that there are rows with the same Number but different title. So I want to have the MAXIMUM of title.

In other ETL-tools like Pentaho or OWB it works. There are functions where I can GROUP BY by Price and Number and get the MAXIMUM of the Title.

Is there a workaround?

ottobat
  • 1
  • 1
  • 1

1 Answers1

1

Have you looked at the Aggregate transformation?

Alternatively, you could do this operation in source SQL, with the advantage that the database engine would process it (the SSIS Aggregate transform is blocking, so it'll load all rows into memory before it spits out its results).

UPDATED: If pre-aggregating in raw SQL (before it gets into SSIS) isn't an option, you could add a surrogate key for the Title:

SELECT
Price,Number,Title,ROW_NUMBER() OVER
(PARTITION BY Price,Number ORDER BY Title ) AS TitleOrdinal
FROM ...

Then your SSIS Aggregate can use MAX(TitleOrdinal) (which is a numeric column) as a surrogate for MAX(Title).

To get the actual MAX(Title), you'd have to join the original dataset to this aggregated set, on Price=Price,Number=Number,TitleOrdinal=[MAX(TitleOrdinal) from aggregated set].

SebTHU
  • 1,385
  • 2
  • 11
  • 22