0

I have a table like this :

Month_ID  Month_Sales 
1         500.0 
2         250.0 
3         150.5 

I want to add a new column to this table which is "Growth" where :

Growth = (Current Month Sales - Prev Month Sales) / Prev Month Sales

I want to do this using SSIS script component. How can I do this?

Amira Bedhiafi
  • 8,088
  • 6
  • 24
  • 60
Mahmud Mridul
  • 95
  • 1
  • 9

2 Answers2

0

1/Create 2 package variables:

  • CurrentMonthSales

  • PrevMonthSales

Both are Double

2/ Use a Data Flow Task where you need a source component (depends on the data source you are dealing with) and select if it is a table or a query.

Let's consider it a table.

3/Drag Row Count in the Data Flow Transformations list. Double click it and in the Section Variable Names, select the variable User::CurrentMonthSales. Row Count task will save, in runtime, the result of the calculation in that variable.

4/Use a second Data Flow Task in the Control Flow. Inside drag another OLEDB Source with the same table . Use another Row Count Task, but this time use the variable User::PrevMonthSales. After the Row Count Task use either a Script Component or a derived column.

5/If you use Derived Column, write a name for the column, choose the option 'Replace the name of your column' if you want to replace the value of your column, or 'Add column' if you want to add that as a column output.

6/In the expression, apply the formula:

( @[User::CurrentMonthSales] - @[User::PrevMonthSales]) /@[User::PrevMonthSales]

and map to the corresponding column in your destination component.

Amira Bedhiafi
  • 8,088
  • 6
  • 24
  • 60
  • I don't think this is going to solve the problem. #3 - A row count is just going to increment so given the supplied data, the value will be 3. If you mean an Aggregate transformation, type of Sum, then the value in the Variable will be 850.5. It will be a constant for the entire run. – billinkc Oct 14 '21 at 16:35
0

For this to work as expected, you must have sorted data as your input. Either,

  • Add an explicit sort to your data source. Change the data access method from table or view to query and use something like SELECT * FROM dbo.MyTable AS T ORDER BY T.Month_ID and then in the Advanced properties, indicate that the data is sorted by that column
  • Add an explicit Sort component to your data flow between the source and the Script Component. Most people choose this option because they think it's what you're supposed to do - they provide you with a Sort component after all, but don't. You will get better performance by sorting explicitly before the data ever gets into the pipeline.

Script Component

I'll be freehanding the script as my current box doesn't have SSIS installed so apologies if I flub something but hopefully the explanation of the method will be better than any possible code errors.

A Script Component by default is a synchronous component. One row in, one row out. So, there's not native way to say "previous row" as there's only the current row's context available to us. Unless we explicitly store what we need into variables.

So, Synchronous is fine behavior here but we'll need to go to the Input/Ouput tab and indicate we'll have our new column Growth data type is Decimal/Numeric (one of them allows you specify the precision/scale, the other is always zero. You want the non-zero data type)

In the column list, you'll want to check Month_Sales as a ReadOnly column and Growth is going to be ReadWrite.

In the script itself, we need to create a class scoped member variable that will "remember" what the previous Month_Sales value was. For each row that enters the buffer, we'll

  1. Compute the Growth
  2. Push that value into the pipeline
  3. Update our member variable with the current row's Month_Sales's value in preparation for the next row.

Business question - What is the growth for Month_ID 1 in the above? We have no previous month value. If we initialize Prev Month Sales to 0, we'll have a divide by zero error.

public class ScriptMain: UserComponent {

public double previousSales;
public overrides void PreExecute()
{
    base.PreExecute();
    // By init to zero, we need to guard against divide by zero
    this.previousSales = 0;
}

public override void Input0_ProcessInputRow(Input0Buffer Row)
{
    double currentSales = 0;
    double growth = 0;
    // Worry about nulls in source data
    if (!Row.MonthSales_IsNull)
    {
        currentSales = Row.MonthSales;
    }

    // Avoid division by zero
    if (this.previousSales != 0)
    {
        growth = (currentSales-this.previousSales) / this.previousSales;
    }


    // Might want to set growth's IsNull if no action performed via the else clause
    Row.Growth = growth;

    // Update the "previous" value with current for next loop
    // Again, be wary of nulls and understand how business expects the calculations to work
    this.previousSales = currentSales;
    
}

}

Query approach

Assuming SQL Server 2012+ or comparable RDBMS, you can compute that value efficiently with a windowing function. Here, I use the LAG as it provides the ability to access the previous row.

You'll note I use a derived table in the query. That's not strictly necessary, but I find it a helpful shorthand to avoid copy/pasting the LAG function twice for the numerator and denominator.

CREATE TABLE dbo.SO_69565661
(
    Month_ID int NOT NULL
,   Month_Sales decimal(9,2) NOT NULL
);

INSERT INTO dbo.SO_69565661
SELECT 1, 500.0
UNION ALL SELECT 2, 250.0
UNION ALL SELECT 3, 150.5;

-- Make use of a derived table to ensure we have the correct window into previous row
-- Use something like this in your OLE DB Source (or even skip the data flow and make this an Execute SQL Command by adding the INSERT/UPDATE)
SELECT D.*

,   (D.Month_Sales - D.PreviousSales) / D.PreviousSales AS Growth
,   FORMAT((D.Month_Sales - D.PreviousSales) / D.PreviousSales, 'P') AS GrowthFormattedText

FROM
(
SELECT
    T.*
    -- Second parameter, 1 here, default and optional, is the number of row(s) to lag by.
    -- Third paraemter NULL here, default and optional, is what is used when no row exists
,   LAG(T.MONTH_SALES, 1, NULL) OVER(ORDER BY T.Month_ID) AS PreviousSales
FROM
    dbo.SO_69565661 AS T
)D;

See the query in action via

https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=0246900438b8eb13d70a4332c1bdad6f

billinkc
  • 59,250
  • 9
  • 102
  • 159