1

I have written a function in DB2 - that is calculating ASCII of records in a particular column. I want to some help as I want to check the ASCII of every single character in string return yes if the ASCII of that record is greater than 127.

BEGIN
ATOMIC DECLARE POS,
INT;

IF INSTR IS NULL THEN RETURN NULL;

END IF;

SET
(
    POS,
    LEN
)=(
    1,
    LENGTH(INSTR)
);


WHILE POS <= LEN DO IF ASCII( SUBSTR( INSTR, POS, 1 ))> 128 THEN RETURN 'Y';

END IF;

SET
POS = POS + 1;

END WHILE;


RETURN 'N';
deHaar
  • 17,687
  • 10
  • 38
  • 51
Rubecka
  • 11
  • 1
  • 4
  • I think using regular expressions is a lot easier. See https://stackoverflow.com/questions/4763757/regular-expressions-in-db2-sql. – The Impaler Dec 24 '19 at 20:37
  • Yes , I have written regular expression also. But first i want to take out all the special characters from a particular table in database. That table has million of records. For that i am thinking to write a function that will give me records having special characters that have ascii greater than 127. I want to calculate ascii of every character in that column. – Rubecka Dec 25 '19 at 11:47

2 Answers2

0

Why to calculate ascii of every character in that column, if the goal is just to get such rows?

SELECT STR
FROM 
(
VALUES 
  'Hello, world'
, 'Привет, мир'
) T (STR)
WHERE xmlcast(xmlquery('fn:matches($s, "[^\x00-\x7F]")' passing t.str as "s") as int) = 1;

The fn:matches function uses regular expressions.
The [^\x00-\x7F] regular expression means "a character with hex value not in the 0x00 - 0x7F interval". If a value of passed t.str contains such a character, the function returns 1 and 0 otherwise.

Mark Barinstein
  • 11,456
  • 2
  • 8
  • 16
  • Can you please explain this query ? – Rubecka Dec 26 '19 at 14:19
  • What does this mean a character with hex value not in the 0x00 - 0x7F interval ? Other than these means extra character ? – Rubecka Dec 27 '19 at 09:47
  • S-ASCII is a character set (and an encoding) with some notable features: Values are between 0–127 (x00–x7F) ASCII code-point 32 (decimal) represents a SPACE ASCII code-point 65 represents the uppercase letter A – Rubecka Dec 27 '19 at 10:26
0

A simple way to check if a UTF-8 value in DB2 only contains "plain ASCII" is to compare it's BYTE length with it's STRINGUNITS32 length. E.g.

SELECT
    S
,   LENGTHB(S) AS BYTES
,   LENGTH4(S) AS CHARACTERS 
,   LENGTHB(S) = LENGTH4(S) PLAIN_ASCII
FROM 
    TABLE(VALUES ('123!"$'),('¹²³€½¾')) T(S)

returns

S     |BYTES|CHARACTERS|PLAIN_ASCII
------|-----|----------|-----------
123!"$|    6|         6|true       
¹²³€½¾|   13|         6|false      

The above assumes your database is Unicode, and the columns are VARCHAR. If they are VARGRAPHIC (i.e. UTF-16), then you would need to cast them to VARCHAR in your comparision

Paul Vernon
  • 3,818
  • 1
  • 10
  • 23