Complex SQL Question: Split one record into multiple records based on a value field

Question

We are working with a customer, and the max characters they have for a value field is 10. Therefore, if we have a value record that has more characters than 10, say 50000000.00 (which is 11), the record will be rejected from their system. What I'd like to do is take that value that is in one record and split it into multiple records. Take for example, this basic record:

SELECT 'ACCT_A' ACCT_NUM, 50000000.00 TOT_AMT
FROM TABLE_NAME
FETCH FIRST 1 ROWS ONLY;

ACCT_NUM	TOT_AMT
ACCT_A	50000000.00

I want to create a dynamic SQL statement that will continue to split that one record into multiple records that we can then pass to our customer. I was trying to do something with the LAG function, but I couldn't quite seem to get it. Since 9999999.99 is the largest amount that they can have (which is the highest value number where the max character length is 10), I was attempting to come up with a query where the final output would be similar to this:

ACCT_NUM	TOT_AMT
ACCT_A	9999999.99
ACCT_A	9999999.99
ACCT_A	9999999.99
ACCT_A	9999999.99
ACCT_A	9999999.99
ACCT_A	0.05

Can anyone help make this happen? The TOT_AMT won't always be 50,000,000, so the query would need to be dynamic enough to take in any value as the TOT_AMT and split it accordingly, if needed (which would probably include some sort of CASE statement like so: CASE WHEN TOT_AMT > 9999999.99 THEN 9999999.99 ELSE TOT_AMT END).

This was as far as I was able to get:

WITH TEMP_SPLIT_TABLE AS (
    SELECT 'ACCT_A' ACCT_NUM, 50000000.00 TOT_AMT FROM TABLE_NAME FETCH FIRST 1 ROWS ONLY
)

SELECT ACCT_NUM, TOT_AMT, CASE WHEN TOT_AMT > 9999999.99 THEN 9999999.99 ELSE TOT_AMT END AS ACCT_AMT, 
    CASE WHEN TOT_AMT > 9999999.99 THEN LAG(TOT_AMT - 9999999.99) OVER (PARTITION BY ACCT_NUM ORDER BY 1) ELSE 0 END LAG_VAL
FROM TEMP_SPLIT_TABLE

I only got one record returned, and I'm not sure how to tune the SQL where it will continue to run this and use the LAG function until that TOT_AMT value has been split up to something that is =< 9999999.99.

Is this a real problem? This sounds like a horrible restriction. If you cannot modify the current column length, then perhaps the customer could accept a new column. If wanting to achieve what you're asking, then what would prevent someone from thinking that 5 of the 6 rows aren't duplicates? — Isolated, Jul 26 '23 at 20:41
This is a value - seriously you have this stored as character data? Values are decimal/numeric which have scale and precision, not length. — Stu, Jul 26 '23 at 21:21
@TheImapler - I'm sorry, I tagged it as DB2 because that's our main platform that we use. I figured if someone could help me with an example there, I'd be able to Google my way into the syntax for PySpark. In retrospect, I probably should have just started there... — dehBoy, Jul 27 '23 at 15:58

Mark Barinstein · Answer 1 · 2023-07-26T21:49:05.200

1

You may use a Recursive Common Table Expression for this.

with 
  mytab (ACCT_NUM, TOT_AMT) as
(
  values 
  ('ACCT_A', 50000000.00)
, ('ACCT_B', 40000000.00)
)
, r (ACCT_NUM, TOT_AMT, LIM_AMT) as
(
  select 
    ACCT_NUM
  , TOT_AMT
  , 9999999.99
  from mytab
    union all
  select 
    ACCT_NUM
  , TOT_AMT - LIM_AMT
  , LIM_AMT
  from r
  where TOT_AMT > LIM_AMT
)
select 
  ACCT_NUM
, MIN (TOT_AMT, LIM_AMT) AS TOT_AMT
from r
order by 1, 2 desc

ACCT_NUM	TOT_AMT
ACCT_A	9999999.99
ACCT_A	9999999.99
ACCT_A	9999999.99
ACCT_A	9999999.99
ACCT_A	9999999.99
ACCT_A	0.05
ACCT_B	9999999.99
ACCT_B	9999999.99
ACCT_B	9999999.99
ACCT_B	9999999.99
ACCT_B	0.04

fiddle

edited Jul 26 '23 at 21:49

answered Jul 26 '23 at 21:42

Mark Barinstein

11,456
2
8
16

Mark - Thank you so much for coming up with a solid solution! Here's the last kicker: I'm trying to replicate this recursive CTE in AWS PySpark sql, and from what I can tell from searching, PySpark doesn't allow it? Are you aware of any solutions that might work in that environment? Thank you again so much for all your help! – dehBoy Jul 27 '23 at 03:26
It's ordinary sql statement. Doesn't pyspark support fully user-specified statements? – Mark Barinstein Jul 27 '23 at 06:35
Minor bug: The anchor member of the recursive CTE should not hardcode `9999999.99` in case the initial amount is lower that this value. – The Impaler Jul 27 '23 at 13:18
1

@TheImpaler The [result](https://dbfiddle.uk/hfj4goDy) is correct in case of initial amount is lower than this value because of the `MIN` function use. – Mark Barinstein Jul 27 '23 at 13:41
@MarkBarinstein Ah, sneaky. You applied it in the main query. Didn't see that one. – The Impaler Jul 27 '23 at 13:58
When running in PySpark I get two errors: 1 with out the word "recursive" - table or view not found 1 with the word "recursive" - no viable alternative at input (ParseException error) Based on some Googling, I found a bunch of old links that said recursive CTE is not available in PySpark. I was hoping there might have been something new might have been added since then haha – dehBoy Jul 27 '23 at 15:53

Complex SQL Question: Split one record into multiple records based on a value field

1 Answers1