1

On IBM DB2 V10 I have created following function. This function contains a recursive common table expression. It splits a text string, for example 'abc,def,ghi', into a table with three rows 'abc','def' and 'ghi'.

CREATE FUNCTION TOOLS.SPLIT (
    TEXT    VARCHAR(32704),
    SPLIT   VARCHAR(16) )
  RETURNS TABLE (WRD VARCHAR(128))
  LANGUAGE SQL
  CALLED ON NULL INPUT
  READS SQL DATA
  DETERMINISTIC
  NO EXTERNAL ACTION
RETURN 
  WITH SPLIT_TBL(WORD, SENTENCE, CNT) AS (    
  SELECT CAST(NULL AS VARCHAR(128)) WORD
        ,TEXT SENTENCE   
        ,0 CNT
  FROM SYSIBM.SYSDUMMY1      
  UNION ALL      
  SELECT CASE WHEN INSTR(S1.SENTENCE, SPLIT) > 0
           THEN LEFT( S1.SENTENCE,
                      INSTR(S1.SENTENCE, SPLIT) - MIN(1, LENGTH(SPLIT)) )
           ELSE S1.SENTENCE
         END AS WORD
        ,CASE WHEN INSTR(S1.SENTENCE, SPLIT) > 0
               AND LENGTH(S1.SENTENCE) > 1
           THEN SUBSTR( S1.SENTENCE,
                        INSTR(S1.SENTENCE, SPLIT) + MAX(LENGTH(SPLIT), 1) )
           ELSE CAST(NULL AS CHAR(128))          
         END AS SENTENCE
        ,CNT + 1 CNT
  FROM SPLIT_TBL S1   
  WHERE S1.SENTENCE IS NOT NULL
    AND S1.CNT < 32704 )  

SELECT WORD
FROM SPLIT_TBL
WHERE WORD IS NOT NULL;

Basically, this function works as expected on single strings.

SELECT s.*, val.*
FROM TABLE (
  SELECT '123,456' sntnc, ',' splt FROM SYSIBM.SYSDUMMY1
) s
LEFT JOIN TABLE (
  TOOLS.SPLIT(s.sntnc, s.splt)
) val ON 1=1;

Produces following table:

SNTNC   | SPLT | WRD
--------|------|----
123,456 |    , | 123
123,456 |    , | 456

However, as soon as there is a table with more than one row to be joined, DB2 returns ERROR [42997] [IBM][DB2] SQL0270N Function not supported (Reason code = ""). This means, if I executed following query with more than one select statement not commented out, this error is returned. Though, each single row by itself does not produce an error.

SELECT s.*, val.*
FROM TABLE (
--  SELECT CAST(NULL AS CHAR) sntnc, ',' splt FROM SYSIBM.SYSDUMMY1
--  UNION ALL
--  SELECT '' sntnc, ',' splt FROM SYSIBM.SYSDUMMY1
--  UNION ALL
--  SELECT 'AB' sntnc, CAST(NULL AS CHAR) splt FROM SYSIBM.SYSDUMMY1
--  UNION ALL
  SELECT ',' sntnc, ',' splt FROM SYSIBM.SYSDUMMY1
  UNION ALL
  SELECT ',ABC,DEF,/' sntnc, '' splt FROM SYSIBM.SYSDUMMY1
--  UNION ALL
--  SELECT ',ABC,ADEF,/' sntnc, ',A' splt FROM SYSIBM.SYSDUMMY1
--  UNION ALL
--  SELECT '123,456,, ' sntnc, ',' splt FROM SYSIBM.SYSDUMMY1
--  UNION ALL
--  SELECT ',,,' sntnc, ',' splt FROM SYSIBM.SYSDUMMY1
) s
LEFT JOIN TABLE (
  TOOLS.SPLIT(s.sntnc, s.splt)
) val ON 1=1;

On IBM Knowledge Center, I could not find a reason, why such a table function cannot be joined on more than one row. Error SQL0270N (Function not supported), without a reason code, appears not too helpful either.

The statement cannot be processed because it violates a restriction as indicated by the following reason code:

I would like to join this function on a subquery or table with more than one row. However, without a useful reason code, I cannot tell what the problem might be. Has anyone an idea what might go wrong when joining more than one row?

conspicillatus
  • 195
  • 3
  • 11
  • Hmm... you should be able to mark this `DETERMINISTIC`, `NO EXTERNAL ACTION`, and `CONTAINS SQL`. It looks like the `SELECT` [reference states you can't use a reference to a column](http://www-01.ibm.com/support/knowledgecenter/SSEPEK_10.0.0/com.ibm.db2z10.doc.sqlref/src/tpc/db2z_sql_tablereference.dita), so it's probably optimizing it away when you specify the rows manually. – Clockwork-Muse Oct 30 '15 at 00:13
  • @clockwork-muse Thx, I added `DETERMINISTIC` and `NO EXTERNAL ACTION`. DB2 did not accept `CONTAINS SQL`, probably because it contains a select from CTE. So it seems, a table function cannot be joined on another table. Is there any way to achieve the expected result? – conspicillatus Nov 02 '15 at 08:10
  • you'd probably have to nest the rest of your statement as the base row in the CTE, probably in a view. Yuck. Actually, though, I'm wondering... the other table you're interested in is just a regular table, right? Not a view, or having a derived column? Also, I hope this is pretty much a one-off conversion, because the optimizer isn't going to be too happy with this, especially if you start searching with it (can't use an index). – Clockwork-Muse Nov 03 '15 at 01:42
  • @Clockwork-Muse Initially, I wanted to use this recursive CTE for one table only. There I had all rows as base for this split. Soon I realized there are other tables containing CSV, that's why I ended up creating a function. Seems to be best to decide for each use case individually what to do. In most situations I could replace it with a simple `CONTAINS`. Though, I envy PostgresSQL for [`select regexp_split_to_table('ab,cd', ',');`](http://www.postgresql.org/docs/current/static/functions-matching.html). – conspicillatus Nov 04 '15 at 14:30
  • ...You're aware this is against good normalization practices, right? Outside some specific reporting purposes (where you shouldn't be splitting the results), tables with multiple values in a single column are frowned upon. For this and other reasons. It's probably one reason why most RDBMSs _don't_ have a split function like this. You may be better off in the long run splitting out the data into new tables or something. I'd still consider the function useful in it's own right, it just sucks you can't use it here. – Clockwork-Muse Nov 04 '15 at 22:28
  • Yep, this function was not my first choice. =) – conspicillatus Nov 05 '15 at 14:03

0 Answers0