I just spent an hour going through all the possible solutions on this page. I took the liberty of collective all these possible solutions into one function, to make it quicker and easier to try and debug.
I hope it can be of use to someone else.
<?php
/**
* Decontaminate text
*
* Primary sources:
* - https://stackoverflow.com/questions/17219916/json-decode-returns-json-error-syntax-but-online-formatter-says-the-json-is-ok
* - https://stackoverflow.com/questions/2348152/detect-bad-json-data-in-php-json-decode
*/
function decontaminate_text(
$text,
$remove_tags = true,
$remove_line_breaks = true,
$remove_BOM = true,
$ensure_utf8_encoding = true,
$ensure_quotes_are_properly_displayed = true,
$decode_html_entities = true
){
if ( '' != $text && is_string( $text ) ) {
$text = preg_replace( '@<(script|style)[^>]*?>.*?</\\1>@si', '', $text );
$text = str_replace(']]>', ']]>', $text);
if( $remove_tags ){
// Which tags to allow (none!)
// $text = strip_tags($text, '<p>,<strong>,<span>,<a>');
$text = strip_tags($text, '');
}
if( $remove_line_breaks ){
$text = preg_replace('/[\r\n\t ]+/', ' ', $text);
$text = trim( $text );
}
if( $remove_BOM ){
// Source: https://stackoverflow.com/a/31594983/1766219
if( 0 === strpos( bin2hex( $text ), 'efbbbf' ) ){
$text = substr( $text, 3 );
}
}
if( $ensure_utf8_encoding ){
// Check if UTF8-encoding
if( utf8_encode( utf8_decode( $text ) ) != $text ){
$text = mb_convert_encoding( $text, 'utf-8', 'utf-8' );
}
}
if( $ensure_quotes_are_properly_displayed ){
$text = str_replace('"', '"', $text);
}
if( $decode_html_entities ){
$text = html_entity_decode( $text );
}
/**
* Other things to try
* - the chr-function: https://stackoverflow.com/a/20845642/1766219
* - stripslashes (THIS ONE BROKE MY JSON DECODING, AFTER IT STARTED WORKING, THOUGH): https://stackoverflow.com/a/28540745/1766219
* - This (improved?) JSON-decoder didn't help me, but it sure looks fancy: https://stackoverflow.com/a/43694325/1766219
*/
}
return $text;
}
// Example use
$text = decontaminate_text( $text );
// $text = decontaminate_text( $text, false ); // Debug attempt 1
// $text = decontaminate_text( $text, false, false ); // Debug attempt 2
// $text = decontaminate_text( $text, false, false, false ); // Debug attempt 3
$decoded_text = json_decode( $text, true );
echo json_last_error_msg() . ' - ' . json_last_error();
?>
I'll maintain it here: https://github.com/zethodderskov/decontaminate-text-in-php/blob/master/decontaminate-text-preparing-it-for-json-decode.php