ASCII control characters are finding their way in to databases I maintain at a semi-frequent basis. While I plan to implement the means to prevent them from being INSERT
ed in to SQL I have to first deal with characters already present in the database first.
For starters I know that it's possible to do a find and replace using SQL...
UPDATE [table_name]
SET [field_name]=REPLACE([field_name],'[string_to_find]','[string_to_replace]');
PHP allows you to use an array to replace multiple items with something else...
$result = str_ireplace(array('1','2','3','4','5'),'0',$my_string);
So instead of lazy SQL queries in loops is there a way we can contruct an SQL query to replace multiple ASCII control characters?
Also very important is how we replace those control characters (e.g. as dec, oct, hex)?
Also I've read that carriage return is part of the control characters 0~31 so obviously we don't want to remove those to make sure line-breaks are retained unless I'm missing something on that?
Here is a reference page I'm going by right now...
I code XHTML as application/xhtml+xml and tested the following as example1.xhtml and open EXPLICITLY in Firefox with NO BYTE ORDER MARK (BOM)...
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<head>
</head>
<body>
<div>
<p>�	



</p>
</div>
</body>
</html>
The following are the only characters that can be used without creating malformed XML, save as example2.xhtml and open EXPLICITLY in Firefox with NO BYTE ORDER MARK (BOM)...
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<head>
</head>
<body>
<div>
<p>	

</p>
</div>
</body>
</html>
Those working characters that we do not want to strip in SQL are...
09 - Horizontal Tab
10 - Line Feed
13 - Carriage Return