Generate new column based on other columns and previous row value

Question

I have a table for employees where I have running total of sales by every employee in each of the 12 months. Some of the values for sales running total are missing, as below:

e_id	month	sales_run_total
e1	Jan	10
e1	Feb	null
e1	March	null
e1	April	40
...	...	...
e1	Dec	120
e2	Jan	null
e2	Feb	null
e2	March	30
e2	April	40
...	...	...
e2	Dec	120

I want to impute these missing values with the previous month's sales_run_total value of an employee. If the missing value is for January month, I want to impute it with zero, as below:

e_id	month	sales_run_total	sales_run_total_new
e1	Jan	10	10
e1	Feb	null	10
e1	March	null	10
e1	April	40	40
...	...	...	...
e1	Dec	120	120
e2	Jan	null	0
e2	Feb	null	0
e2	March	30	30
e2	April	40	40
...	...	...	...
e2	Dec	120	120

I have tried to do it using lag() function, as below :

select e_id,
       month,
       lag(sales_run_total) over (partition by e_id, month) as sales_run_total_new
from table_name

But this does not consider whether there is a null or non-null value in actual column. Also, it does not consider the case where we have null for January month.

How can this be done in SQL?

Relevant? https://stackoverflow.com/questions/60173864/hive-ignore-nulls — Charlieface, Apr 11 '22 at 10:09

score 1 · Answer 1 · answered Apr 11 '22 at 11:47

1

You need to pick the window between proceedings and current row and find most recent not null value.
Use LAST_VALUE (value, ignore_null boolean) window function to pick last non null value ordering by e id and month.
LAG() isnt suitable because it can not reach beyond previous row and you can have null value in previous row.

can you pls try this ?

SELECT e_id,`month`,sales_run_total,
COALESCE(sales_run_total, LAST_VALUE(sales_run_total, TRUE) OVER(ORDER BY e_id, `month` ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW))  as sales_run_total_new
FROM table_name

answered Apr 11 '22 at 11:47

Koushik Roy

6,868
2
12
33

Thanks @Koushik. I solved it in 2 steps: First, whenever the month is January and the sales value is NULL, I imputed it with 0. Then, I used the approach suggested by you to get the final table. – karan Apr 11 '22 at 16:14
1

First step to do before the solution in answer: SELECT e_id, `month`, sales_run_total, case when sales_run_total is NULL then ( case when service_month = 'Jan' then 0 else sales_run_total end ) else sales_run_total end as sales_run_total_temp FROM table_name – karan Apr 11 '22 at 16:20

Generate new column based on other columns and previous row value

1 Answers1