0

I have a PHP function that generates an XML file from my data. Some of which was submitted via textarea fields.

When I create the XML file the textarea fields are displaying with an unusual carriage return at the end of the value. I've tried removing with the following methods, none of which do anything.

trim($value)
str_replace( "\n", "", $value)
str_replace( "\r", "", $value)
str_replace( "\n\r", "", $value)
str_replace( "\r\n", "", $value)
preg_replace('/\s\s+/', ' ', $value)

even tried strip_tags($value) and html_entity_decode($value) in case it was something weird i could strip out.

One thing that did remove it was removing all but alphanumeric characters via a regex but thats no use since my users will want to use a lot of characters like dashes, brackets, single and double quotes, etc.

Are there any other methods of removing weird characters like this? Or any other strange carriage returns that I can remove via code?

2 Answers2

2

You may use

$value = preg_replace('/\R+/u', ' ', $value)

Here, \R matches any Unicode line break sequence.

Also, see the /u modifier reference:

u (PCRE_UTF8)
This modifier turns on additional functionality of PCRE that is incompatible with Perl. Pattern and subject strings are treated as UTF-8.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
0

I've had similar cases. In my case, there was a non-breaking-space unicode character which looked like a space, but wasn't.

What you could do it iteracte over all the characters in the string, and inspect them one by one so see what's odd. This will probably not solve your problem directly, but at least help you get to the solution I hope.

for( $i = 0; $i < strlen($value); $i++ ) {
    $chr = $value[$i];
    echo "{$i}: [$chr}] [".ord($chr)."];\n";
}
JoerT
  • 57
  • 4