7

Multi-byte characters had caused me a lot of pain.

Any suggestion for this problem?

I have a CLOB field that might contains some multi-byte characters, and I need to select in SQL and convert this field into a string for downstream process, currently I am using:

SELECT DBMS_LOB.SUBSTR( description, 4000, 1 ) FROM table

But the 4000 in above command is in length of characters, rather than bytes. So I had to change to 3000 to handle any multi-byte characters that might have crept into the data else buffer size error will occur.

The problem is for records that do not contain multibyte character, it might unnecessarily truncated more data than it need to. (The 4000 is the string limitation, we can/had to live with that.)

Is there a way to do something in equivalent of:

SELECT DBMS_LOB.SUBSTR( description, 4000bytes, 1 ) FROM table

That way I can get as much data out as possible.

Note: I am not allowed to create temp tables/views, not using PL/SQL, only SQL SELECT...

John Woo
  • 258,903
  • 69
  • 498
  • 492
alchn
  • 327
  • 1
  • 5
  • 16

2 Answers2

9

Jeffrey's thinking process is ok, but alchn is also right. Just ran into this same problem and here is my solution. You'll have to be able to create a function though:

Create Or Replace Function clob_substr(p_clob   In Clob
                                      ,p_offset In Pls_Integer
                                      ,p_length In Pls_Integer) Return Varchar2 Is
Begin
  Return substrb(dbms_lob.substr(p_clob
                                ,p_length
                                ,p_offset)
                ,1
                ,p_length);
End;
/

Here is a demo of it's use:

Select c
      ,clob_substr(c
                  ,1
                  ,4000)
  From (

        Select xmlelement("t", rpad('é', 4000, 'é'), rpad('é', 4000, 'é')).extract('//text()').getclobval() c
          From dual

        );
mlvnd
  • 311
  • 1
  • 4
  • 1
    I have created a comprehensive (I hope!) example based on this answer in https://bitbucket.org/janihur/orasql-ex/src/tip/clob-2.sql – user272735 Jul 09 '14 at 12:17
  • I have tested and confirmed that using this function works better than Jeffrey's solution. However, I don't understand why. Isn't this doing the same exact thing, but in a function instead of inline? Why does that make a difference here? – KStensland Jun 15 '16 at 18:35
  • I suspect the difference is that varchar2 in PL/SQL can go up to 32,767 bytes, whereas in SQL (at least in 11g and earlier) the max is 4,000 bytes. – Jeffrey Kemp Aug 10 '16 at 06:04
5

Maybe truncate the resulting varchar2 with SUBSTR:

SELECT SUBSTRB( DBMS_LOB.SUBSTR( description, 4000, 1 ), 1, 4000) FROM table
Ben
  • 51,770
  • 36
  • 127
  • 149
Jeffrey Kemp
  • 59,135
  • 14
  • 106
  • 158
  • Shouldn't that be "... with SUBSTRB" ? (SUBSTRB instead of SUBSTR) – Frank Schmitt Aug 23 '12 at 10:25
  • 1
    Out of interest: Do you know what happens if the first 3999 characters are one byte long and the 4000th character is a multi-byte one? Won't it return a wrong character at position 4000 (since it interprets the first byte of the multi-byte char as a single-byte char)? – Frank Schmitt Aug 23 '12 at 10:28
  • 2
    No Jeffery it won't work, if 'description' contains multibyte characters, the innermost DBMS_LOB.SUBSTR already hit a string buffer error before it reach the SUBSTRB. – alchn Aug 24 '12 at 00:20
  • 2
    If only we have a DBMS_LOB.SUBSTRB then my problem will be solved! – alchn Aug 24 '12 at 00:22
  • 2
    To answer Frank's question, yes some sort of truncation of the character will happen. Try this SELECT SUBSTRB( 'Ä Ê Í Ó Ø A B C D E', 1, 1) FROM DUAL; and this SELECT SUBSTRB( 'Ä Ê Í Ó Ø A B C D E', 1, 2) FROM DUAL; – alchn Aug 24 '12 at 00:23