I'm trying to do some simple regular expressions against a DB2 v10.5 database. The database contains a lot of records.
The following works:
SELECT
e.EMLNAME,
XMLCAST(XMLQUERY('fn:matches($LNAME, "^[A-Z]*$")' PASSING (e.EMLNAME) as "LNAME") as integer) as foo
FROM MYDB.EMPLOYEE e;
However when I try to compare the regex results above it then goes from running to failing.
Sample query that fails:
-- Fails
SELECT
e.EMLNAME
FROM MYDB.EMPLOYEE e
WHERE XMLCAST(XMLQUERY('fn:matches($LNAME, "^[A-Z]*$")' PASSING (e.EMLNAME) as "LNAME") as integer) = 1;
The error reads:
[SELECT - 0 row(s), 0.000 secs] [Error Code: -20377, SQL State: 0N002] An illegal XML character "#x1A" was found in an SQL/XML expression or function argument that begins with string "O NE".. SQLCODE=-20377, SQLSTATE=0N002, DRIVER=4.18.60
Doing some googling, #x1A look to be the SUBSTITUTE
character escaped (http://www.fileformat.info/info/unicode/char/1a/index.htm). Seeing that the XMLQUERY portion works, I tried wrapping it with the XMLCAST to do the comparison which I found in another stack overflow post (Regular Expressions in DB2 SQL).
I had a DBA try the query on his box and it worked. His machine is a windows box, mine is a Mac. We are both using DBVisualizer. He had an older version and upgraded to 9.5.2 and it still worked for him. After I upgraded, I'm still having the same issue, but with a new driver number:
[SELECT - 0 rows, 0.000 secs] [Code: -20377, SQL State: 0N002] An illegal XML character "#x1A" was found in an SQL/XML expression or function argument that begins with string "O NE".. SQLCODE=-20377, SQLSTATE=0N002, DRIVER=4.19.49
I downloaded and installed IntelliJ's Data Grip, as per a recommendation of one of my coworkers. Nice tool, however when I run the query against the last name field then it blows up too, with a similar error, which makes me think that the error message is coming from DB2 or the java driver?
[0N002][-20377] An illegal XML character "#x1A" was found in an SQL/XML expression or function argument that begins with string "O NE".. SQLCODE=-20377, SQLSTATE=0N002, DRIVER=4.7.85
In the connection setup, my connection string looks something like:
jdbc:db2://serverdns:50001/MYDB
type 4
driver: DB2 (LUW)
Targeting a different field in the db allows for me to run the query without an issue... so looks like a possible data issue, but since windows can run this, it makes me wonder if its a driver, or if I need to do some sort of special escaping before running the query?
I need a developer friendly way to run this query on a mac. Specifically using db visualizer or data grip. Intent is to eventually port over into java code, so if its a driver update, or alter to the query, either will work, as long as it works in one of the query tools mentioned and can be ported into java (with additional escaping, etc.)
Digging in to what was reported, I did a query against last names staring with O
and having NE
in the name. This lead me to one record that looked like it could be a match for the error. I copied the text of the field and pasted it into an online form, which what looked like a space went away. However when I pasted it into sublime, I saw the SUB
character holder.
Doing the query above, but excluding that one record allows for the query to run properly. Based on this, it looks like the query on mac needs to know how to run when characters like SUB
are present in the field/string.
What can I do to make this work on my mac?
Update
I'm still looking to see if there is a way to do what was requested. However since we haven't found a straight answer yet, I'm posting some things that have been helpful for working around the issue. Note that I didn't find a way to do ranges with version 10.5, so I had to explicitly do a case for each value that could be bad in the x0E - x1F
range.
Identifying records with bad data like:
SELECT
e.ID, e.EMLNAME
FROM MYDB.EMPLOYEE e
WHERE
-- Last name checks.
e.EMLNAME like CONCAT(CONCAT('%', x'0E'), '%')
OR e.EMLNAME like CONCAT(CONCAT('%', x'0F'), '%')
OR e.EMLNAME like CONCAT(CONCAT('%', x'10'), '%')
OR e.EMLNAME like CONCAT(CONCAT('%', x'11'), '%')
OR e.EMLNAME like CONCAT(CONCAT('%', x'12'), '%')
OR e.EMLNAME like CONCAT(CONCAT('%', x'13'), '%')
OR e.EMLNAME like CONCAT(CONCAT('%', x'14'), '%')
OR e.EMLNAME like CONCAT(CONCAT('%', x'15'), '%')
OR e.EMLNAME like CONCAT(CONCAT('%', x'16'), '%')
OR e.EMLNAME like CONCAT(CONCAT('%', x'17'), '%')
OR e.EMLNAME like CONCAT(CONCAT('%', x'18'), '%')
OR e.EMLNAME like CONCAT(CONCAT('%', x'19'), '%')
OR e.EMLNAME like CONCAT(CONCAT('%', x'1A'), '%')
OR e.EMLNAME like CONCAT(CONCAT('%', x'1B'), '%')
OR e.EMLNAME like CONCAT(CONCAT('%', x'1C'), '%')
OR e.EMLNAME like CONCAT(CONCAT('%', x'1D'), '%')
OR e.EMLNAME like CONCAT(CONCAT('%', x'1E'), '%')
OR e.EMLNAME like CONCAT(CONCAT('%', x'1F'), '%');
Removing the bad characters like:
UPDATE MYDB.EMPLOYEE e SET e.EMLNAME = REPLACE(e.EMLNAME, x'0E', '');
UPDATE MYDB.EMPLOYEE e SET e.EMLNAME = REPLACE(e.EMLNAME, x'0F', '');
UPDATE MYDB.EMPLOYEE e SET e.EMLNAME = REPLACE(e.EMLNAME, x'10', '');
UPDATE MYDB.EMPLOYEE e SET e.EMLNAME = REPLACE(e.EMLNAME, x'11', '');
UPDATE MYDB.EMPLOYEE e SET e.EMLNAME = REPLACE(e.EMLNAME, x'12', '');
UPDATE MYDB.EMPLOYEE e SET e.EMLNAME = REPLACE(e.EMLNAME, x'13', '');
UPDATE MYDB.EMPLOYEE e SET e.EMLNAME = REPLACE(e.EMLNAME, x'14', '');
UPDATE MYDB.EMPLOYEE e SET e.EMLNAME = REPLACE(e.EMLNAME, x'15', '');
UPDATE MYDB.EMPLOYEE e SET e.EMLNAME = REPLACE(e.EMLNAME, x'16', '');
UPDATE MYDB.EMPLOYEE e SET e.EMLNAME = REPLACE(e.EMLNAME, x'17', '');
UPDATE MYDB.EMPLOYEE e SET e.EMLNAME = REPLACE(e.EMLNAME, x'18', '');
UPDATE MYDB.EMPLOYEE e SET e.EMLNAME = REPLACE(e.EMLNAME, x'19', '');
UPDATE MYDB.EMPLOYEE e SET e.EMLNAME = REPLACE(e.EMLNAME, x'1A', '');
UPDATE MYDB.EMPLOYEE e SET e.EMLNAME = REPLACE(e.EMLNAME, x'1B', '');
UPDATE MYDB.EMPLOYEE e SET e.EMLNAME = REPLACE(e.EMLNAME, x'1C', '');
UPDATE MYDB.EMPLOYEE e SET e.EMLNAME = REPLACE(e.EMLNAME, x'1D', '');
UPDATE MYDB.EMPLOYEE e SET e.EMLNAME = REPLACE(e.EMLNAME, x'1E', '');
UPDATE MYDB.EMPLOYEE e SET e.EMLNAME = REPLACE(e.EMLNAME, x'1F', '');