I'm trying to migrate some poorly formed data into a database. The data comes from a CSV, and is first loaded into a staging table of all varchar columns (as I cannot enforce type safety at this stage).
The data might look like
COL1 | COL2 | COL3
Name 1 | |
2/11/16 | $350 | $230
2/12/16 | $420 | $387
2/13/16 | $435 | $727
Name 2 | |
2/11/16 | $121 | $144
2/12/16 | $243 | $658
2/13/16 | $453 | $214
The first colum is a mixture of company names as pseudo-headers, and dates for which colum 2 and 3 data is relevant. I'd like to start transforming the data by creating a 'Brand' column - where 'StoreBrand' is the value of Col1 if Col2 is NULL, or the previous row's StoreBrand otherwise. Comething like:
COL1 | COL2 | COL3 | StoreBrand
Name 1 | | | Name 1
2/11/16 | $350 | $230 | Name 1
2/12/16 | $420 | $387 | Name 1
2/13/16 | $435 | $727 | Name 1
Name 2 | | | Name 2
2/11/16 | $121 | $144 | Name 2
2/12/16 | $243 | $658 | Name 2
2/13/16 | $453 | $214 | Name 2
I wrote this:
SELECT
t.*,
CASE
WHEN t.COL2 IS NULL THEN COL1
ELSE LAG(StoreBrand) OVER ()
END AS StoreBrand
FROM
(
SELECT
ROW_NUMBER() OVER () AS i,
*
FROM
Staging_Data
) t;
But the database (postgres in this case, but we're considering alternatives so the most diverse answer is preferred) chokes on LAG(StoreBrand) because that's the derived column I'm creating. Invoking LAG(Col1) only populates the first row's real data:
COL1 | COL2 | COL3 | StoreBrand
Name 1 | | | Name 1
2/11/16 | $350 | $230 | Name 1
2/12/16 | $420 | $387 | 2/11/16
2/13/16 | $435 | $727 | 2/12/16
Name 2 | | | Name 2
2/11/16 | $121 | $144 | Name 2
2/12/16 | $243 | $658 | 2/11/16
2/13/16 | $453 | $214 | 2/12/16
My goal would be a StoreBrand column which is the first value of COL1 for all date values before the next brand name:
COL1 | COL2 | COL3 | StoreBrand
Name 1 | | | Name 1
2/11/16 | $350 | $230 | Name 1
2/12/16 | $420 | $387 | Name 1
2/13/16 | $435 | $727 | Name 1
Name 2 | | | Name 2
2/11/16 | $121 | $144 | Name 2
2/12/16 | $243 | $658 | Name 2
2/13/16 | $453 | $214 | Name 2
The value of StoreBrand when Col2 and Col3 are null is inconsequential - that row will be dropped as part of the conversion process. The important thing is associating the data rows (i.e. those with dates) with their brand.
Is there a way to reference the previous value for the column that I'm missing?