11

I have inserted a init file into MongoDB:

db.User.insert({ "_id" : ObjectId("5589929b887dc1fdb501cdba"), "_class" : "com.smartinnotec.aposoft.dao.domain.User", "title" : "DI.", ... "address" : { "_id" : null, ... "country" : "Österreich" }})

And if I invoke this entry with db.User.find(), than I get the following:

{ "_id" : ObjectId("5589929b887dc1fdb501cdba"), "_class" : "com.smartinnotec.aposoft.dao.domain.User", "title" : "DI.", ... "address" : { "_id" : null, ... "country" : "�sterreich" } }

The word with special characters "�sterreich is not correct.

Does anybody have any idea what I can do in mongodb in order to solve this problem?

Somnath Muluk
  • 55,015
  • 38
  • 216
  • 226
quma
  • 5,233
  • 26
  • 80
  • 146
  • 1
    what version of mongodb are you using? – Alex Dec 09 '15 at 09:38
  • 2
    Are you getting this result from the mongo console ? – Rabea Dec 09 '15 at 20:30
  • 1
    I use v3.0.7 too. I try your codes there is no any weird situation. I wnat to ask the same question with Rabee's; "Are you getting this result from the mongo console ?". Because MongoDB stores the data in BSON form as UTF8 encoded. The changing may occured immediate before the sending to MongoDB. Good luck.. – efkan Dec 10 '15 at 08:48
  • 1
    Have you Linux language packages installed? Especially the one for the language your string in? – Constantin Guay Dec 10 '15 at 20:42
  • 1
    Over console everything is ok and I use Windows 7. – quma Dec 11 '15 at 07:22
  • 1
    So where this problem is happening? Where the insert query is running from? – Amir Dec 14 '15 at 22:34
  • 2
    The console that you are printing your results to, is trying to represent single byte characters(`UTF-8`) in `UTF-16` or some other multi-byte character set. You need to change the console settings to display characters in `UTF-8` format. – BatScream Dec 15 '15 at 06:51
  • 1
    @user3318489: Yes. You should use UTF 8 version rather than using HTML codes. It will be useful while making queries too. When you want to search convert it to UTF-8 and then search. Check my answer. – Somnath Muluk Dec 15 '15 at 17:06

2 Answers2

7

JSON and BSON can only encode / decode valid UTF-8 strings, if your data (included input) is not UTF-8 you need to convert it before passing it to any JSON dependent system, like this:

$string = iconv('UTF-8', 'UTF-8//IGNORE', $string); // or
$string = iconv('UTF-8', 'UTF-8//TRANSLIT', $string); // or even
$string = iconv('UTF-8', 'UTF-8//TRANSLIT//IGNORE', $string); // not sure how this behaves

Personally I prefer the first option, see the iconv() manual page. Other alternatives include:

mb_convert_encoding("Österreich", "UTF-8", "ISO-8859-1");

  • utf8_encode(utf8_decode($string))

You should always make sure your strings are UTF-8 encoded, even the user-submitted one.

Somnath Muluk
  • 55,015
  • 38
  • 216
  • 226
4

Guess so you can use the HTML Codes inside a string

Code:

You can use &ouml ; to save the spl char in db.

db.User.insert({ "_id" : ObjectId("5589929b887dc1fdb501cdba"), "_class" : "com.smartinnotec.aposoft.dao.domain.User", "title" : "DI.", ... "address" : { "_id" : null, ... "country" : "österreich" }})

And on invoking this entry with db.User.find(),you will get the following:

{ "_id" : ObjectId("5589929b887dc1fdb501cdba"), "_class" : "com.smartinnotec.aposoft.dao.domain.User", "title" : "DI.", ... "address" : { "_id" : null, ... "country" : "Österreich" } }

Reference:

http://www.starr.net/is/type/htmlcodes.html

Replace multiple characters in a string in javascript

Hope this helps.

Community
  • 1
  • 1
SUNDARRAJAN K
  • 2,237
  • 2
  • 22
  • 38