0

using express and body-parser with the following setup:

const express = require('express');
const bodyParser = require('body-parser');
const app = express();

app.use(bodyParser.json({limit: '100mb', extended: true,type: 'application/*+json;charset=utf-8'}))
app.use(bodyParser.urlencoded({limit: '100mb', extended: true}));
app.use(bodyParser.text({defaultCharset: 'utf-8'}));
app.use(express.json());

When running the nodejs server using npm start, special characters inside the json body are utf-8 encoded as expected. Once hosted in IIS, the character encoding fails. Only difference is the hosting environment. The .NET Globalization options are set correctly for the IIS site hosting the nodejs server application with 'utf-8' settings, without making any difference. Double-checked web.config. What could possibly be messing up the incoming requests data?

json body request output - when nodejs is hosted in IIS

nodejs output hosted in IIS

json body request output - when server is run directly using npm start:

nodejs output hosted in IIS

A hint on what may be going on: https://www.i18nqa.com/debug/bug-utf-8-latin1.html

Anyone know where to look?

  • I suspect this may be caused by not specifying the character encoding correctly, try to convert them to ISO-8859-1 encoding. – Ding Peng Nov 13 '20 at 02:51
  • Gave it a try by using "iconv" to convert from ISO-8859-1 to UTF-8: Sjofart æ ø Ã¥ - Æ Ø Å It looks like "Ã" just got appended to every "character". If i convert from UTF-8 to ISO-8859-1 it gets even stranger: Sjofart � � � - � � � – Rune Solberg Nov 13 '20 at 08:46
  • The table explains the possible reason: https://www.i18nqa.com/debug/bug-double-conversion.html It seems obvious that something is mistakenly happening either inside the body-parser or in the hosting framework or even a combination of the two, where the UTF-8 bytes are interpreted as either Windows-1252 or ISO-8859-1 bytes. Just don't know where to start looking – Rune Solberg Nov 13 '20 at 09:02

1 Answers1

0

Found the issue, which is probably as simple as it is stupid. I will post it anyway since I spent a lot of time looking in the wrong places. The problem was divided. The baretail app used for reviewing the logged request on the server was run with an ANSI charset. Changing it to UTF-8 displayed the special characters correctly on screen. The API to further receive this request saved the special characters as "unknown characters" - leading me to start looking at the problem already on what were coming in on the nodejs server.

Absolutely worth noting - The API to further receive this request is the Facebook Conversions API -> https://developers.facebook.com/docs/marketing-api/conversions-api/using-the-api#verify. It does not handle the requests as 'application/json' but 'application/x-www-form-urlencoded' with a form-data request containing the json object along with the facebook param "format" set to "json". Content-Type "application/json" works technically for this API-call but special characters are saved with the "unknown char question mark" symbol. Setting the "Accept-Language" in header or the facebook "locale" param in the request does not work for this API-call specifically.