3

I need to know whether a string contains only ASCII characters. So far I use this REGEX:

DECLARE
    str VARCHAR2(100) := 'xyz';
BEGIN
    IF REGEXP_LIKE(str, '^[ -~]+$') THEN
        DBMS_OUTPUT.PUT_LINE('Pure ASCII');
    END IF;
END;
/

Pure ASCII

' ' and ~ are the first, resp. last character in ASCII.

Problem is, this REGEXP_LIKE fails on certain NLS-Settings:

ALTER SESSION SET NLS_SORT = 'GERMAN'; 

DECLARE
    str VARCHAR2(100) := 'xyz';
BEGIN
    IF REGEXP_LIKE(str, '^[ -~]+$') THEN
        DBMS_OUTPUT.PUT_LINE('Pure ASCII');
    END IF;
END;
/

ORA-12728: invalid range in regular expression
ORA-06512: at line 4

Do anybody knows a solution which works independently from current user NLS-Settings? Is this behavior on purpose or should it be considered as a bug?

Wernfried Domscheit
  • 54,457
  • 9
  • 76
  • 110
  • 2
    Would this work: `IF (ASCIISTR(str) = str) THEN`? – vc 74 Jun 18 '18 at 17:57
  • 1
    `[ -~]` is printable ascii, but ascii is really 7-bit `[\x00-\x7F]` –  Jun 18 '18 at 18:05
  • @sln, you are right, I mean "... contains only *printable* ASCII characters" – Wernfried Domscheit Jun 18 '18 at 18:07
  • Then you would want `[\x20-\x7E]` Using byte syntax should never throw a range error. –  Jun 18 '18 at 18:08
  • 2
    Interestingly a backslash seems to fail that test; `asciistr('\')` gives `\005C`, presumably to avoid confusion as that has a special meaning in the output. – Alex Poole Jun 18 '18 at 18:08
  • 1
    @sln, `[\x20-\x7E]` also raises ORA-12728 – Wernfried Domscheit Jun 18 '18 at 18:16
  • @WernfriedDomscheit - what's that exception ? If it's invalid range, then there is something modifying all your text, not just the regex. Or, it could be that this is a new bug Oracle doesn't know about. –  Jun 18 '18 at 18:33
  • 1
    @sln - in Oracle's German linguistic sort order, the character x20 is "greater" than character x7E. x7E is #63 and x20 is #66. The full printable ASCII character range is x21 - x39. – kfinity Jun 18 '18 at 18:52
  • @kfinity - That's the thing. In regex land, `\xDD` is not a character, it is a number, the _code_ of the character. There is no translation from code to character, there is only translation from char to code, in regex landia. Where the range integer arithmetic is done. Hope you understand that. That's why the syntax `\xDD` is _unique_ to regex parsers, so language does not play a factor. –  Jun 18 '18 at 19:01
  • @kfinity - It would seem impossible for a regex engine to throw a range error when the first _number_ is lower than the second _number_, if you catch my drift. There could always be the problem of surrogate pairs inside of classes, which can cause a range error. –  Jun 18 '18 at 19:11
  • Basically the same question >> https://stackoverflow.com/questions/2236475/finding-and-removing-non-ascii-characters-from-an-oracle-varchar2 – Kris Rice Jun 18 '18 at 19:39

3 Answers3

1

You can use TRANSLATE to do this. Basically, translate away all the ASCII printable characters (there aren't that many of them) and see what you have left.

Here is a query that does it:

WITH input ( p_string_to_test) AS ( 
SELECT 'This this string' FROM DUAL UNION ALL
SELECT 'Test this ' || CHR(7) || ' string too!' FROM DUAL UNION ALL
SELECT 'xxx' FROM DUAL)
SELECT p_string_to_test, 
       case when translate(p_string_to_test, 
       chr(0) || q'[ !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~]', 
       chr(0)) is null then 'Yes' else 'No' END is_ascii
FROM input;
+-------------------------+----------+
|    P_STRING_TO_TEST     | IS_ASCII |
+-------------------------+----------+
| This this string        | Yes      |
| Test this  string too!  | No       |
| xxx                     | Yes      |
+-------------------------+----------+
Matthew McPeak
  • 17,705
  • 2
  • 27
  • 59
1

ASCII function with upper limit of 127 may be used :

declare
    str nvarchar2(100) := '\xyz~*-=)(/&%+$#£>|"éß';
    a   nvarchar2(1);
    b   number := 0;
begin
    for i in 1..length(str)
    loop                 
      a := substrc(str,i,1);
      b := greatest(ascii(a),b);      
    end loop;

    if b < 128 then  
     dbms_output.put_line('String is composed of Pure ASCII characters');
    else
     dbms_output.put_line('String has non-ASCII characters');      
    end if; 
end;
Barbaros Özhan
  • 59,113
  • 10
  • 31
  • 55
0

I think I will go for one of these two

IF CONVERT(str, 'US7ASCII') = str THEN
    DBMS_OUTPUT.PUT_LINE('Pure ASCII');
END IF;



IF ASCIISTR(REPLACE(str, '\', '/')) = REPLACE(str, '\', '/') THEN
    DBMS_OUTPUT.PUT_LINE('Pure ASCII');
END IF;
Wernfried Domscheit
  • 54,457
  • 9
  • 76
  • 110