7

I have a DB2 database containing millions of records. I found that some char() or varchar() fields contain special characters which shouldn't be stored. I guess application received broken data or some code made it.

Anyway, I want to find records that have these broken data, which are special characters (not alphabetic).

I tried to find the way using query but couldn't. Does someone know the good query or advice?

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
nick
  • 229
  • 3
  • 5
  • 10

6 Answers6

18

You can use the DB2 TRANSLATE() function to isolate non-alphanumeric characters. Note that this will not work in the Oracle compatibility mode, because in that case DB2 will treat empty strings as NULLs, as Oracle would do.

SELECT *
FROM yourtable
WHERE LENGTH(TRANSLATE(
  yourcolumn,
  '', -- empty string
  'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789'
)) > 0 -- after translating ASCII characters to empty strings 
       -- there's still something left
mustaccio
  • 18,234
  • 16
  • 48
  • 57
  • 1
    It seems you are assuming an LUW environment, eh? ;-) z/OS and IBM i would (most likely) be working in EBCDIC. But that only effects your comment, the query is portable. But actually, I think you mean non-alphabetic characters, or probably should be non-alphanumeric. ASCII and EBCDIC contain many code points which are not alphanumeric, or even displayable characters. CR & LF are simple examples. – WarrenT Jul 04 '13 at 17:58
  • 1
    It works! Thank you mustaccio. Additionally, My field has some blank field, so I use that function LENGTH(TRIM(TRANSLATE())). and the TRANSLATE sample above has one misspell. 'abcdefj' : j->g ;) check it if someone use that code. have a good day! – nick Jul 05 '13 at 00:11
8

I know this is an older thread...but after reading a ton...this was my exact problem and here is the solution I came up with to determine the problem rows...so that I could go in and manually fix them. FYI - the problem for me happens because users are copy/pasting from Word into my app. Yes I know we should fix that before ever saving...but we have bigger fish to fry.

SELECT * FROM TABLE_A where ASCII(TRIM(TRANSLATE( COLUMN_A, ' ', -- empty string '()<>!;%$#*?@+&^=-":/''.,0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz' ))) not in (10,64)

Some Notes:

  • We use iSeries DB2 and this works great
  • Make sure to keep all the spaces intact in the translate function...it needs 1 space for ever character you use
  • In the 3rd parameter of the translate function there are 2 single quotes next to each other and the first one simply escapes the other (for those that may not know)
user4342423
  • 81
  • 1
  • 1
  • What represent 'not in 10 and 64', I know that for some reason that code return 64 with empty string, but what is 10?. thank you for your answer. – Juan Ruiz de Castilla May 25 '16 at 20:38
  • 10 is likely a line feed. He's got a carriage return (which would be 13) in the string to search for. This answer totally helped me. – Meower68 Mar 21 '19 at 03:31
2

You can use a regular expression in order to retrieve the invalid characters. However this process is very expensive, because you have to read all the data, and then process it.

In order to use regular expressions in DB2, you have to adapt the environement, because this functionality is not available for SQL in the installation. You have these three options:

Once you have defined a regular expression to ignore the valid characters (something like /[^a-zA-Z0-9]/ ), then you could executed in the database. Remember to retrieve other column where you can detect the row (for example a column ID) and then perfom updates or delete to prune the invalid characters.

If you do not know how to use regular expression, here you have a good source of information: http://www.regular-expressions.info/ Specially http://www.regular-expressions.info/charclass.html

There is a related question about regular expression: Regular Expressions in DB2 SQL

Community
  • 1
  • 1
AngocA
  • 7,655
  • 6
  • 39
  • 55
2

If by special characters you mean non-printable characters you could use this one:

select yourfield, hex(trim(yourfield)),TRANSLATE(
  yourfield,
  ' ', 
  x'000102030405060708090A0B0C0D0E0F101112131415161718191A1B1C1D1E1F202122232425262728292A2B2C2D2E2F303132333435363738393A3B3C3D3E3F')
from yourtable
where yourfield <> TRANSLATE(
  yourfield,
  ' ', 
  x'000102030405060708090A0B0C0D0E0F101112131415161718191A1B1C1D1E1F202122232425262728292A2B2C2D2E2F303132333435363738393A3B3C3D3E3F')

You'll probably see that your odd characters HEX() is < 40.

Anyway if you know the HEX() of your odd/special characters you could use this approach and then replace them for a space or whatever you want permanently:

Update yourtable
set yourfield= Translate(yourfield, 
                            ' ', 
                            x'000102030405060708090A0B0C0D0E101112131415161718191A1B1C1D1E202122232425262728292A2B2C2D2E303132333435363738393A3B3C3D3E0F1F2F3F')
where yourfield <> Translate(yourfield, 
                            ' ', 
                            x'000102030405060708090A0B0C0D0E101112131415161718191A1B1C1D1E202122232425262728292A2B2C2D2E303132333435363738393A3B3C3D3E0F1F2F3F')
Watchmaker
  • 4,908
  • 1
  • 37
  • 38
  • What would be the right way to retain any numbers using a similar approach. When I run this I get everything removed including digits. – horatio1701d Dec 09 '18 at 14:04
  • Could you tell me why this is supposed to work, and in which DB2 version? Displaying the x'...' part shows `a bunch of squares and rectangles followed by !"#$%&'()*+,-./0123456789:;<=>` that I couldn't copy. And as expected, when `myfields` contains alphabets, it wasn't able to translate. So, I'm guessing the hex is missing a lot of `printable` characters? Impractical, if we have to find all the printable characters to put in that quote. – user3758745 Jan 14 '19 at 21:53
-1

This query has worked for me in the past on iSeries DB2.

select * from db/table where posstr(field, x'3F') > 0 

Trouble is you have to be certain of the hex value you are searching for in the string. I had a similar situation where the I was sure the hex code for the character was x'3F, but when I sub-string the non-viewable character it was actually x'22. You might want to single out the character that is giving you the issue and see what it's value is.

select hex(substr(field, 21,1)) from db/table where posstr(field, 'StringBeforeCharacter') > 0 
jwbradley
  • 11
  • 3
-2

You can use below SQL which will work fine.

select col 1 from table where col1 like '%'|| chr(10) || '%';