0

I put 250_000 records to mongodb with java driver, but forgot to set encoding explicitly, so after linux on my windows machine system encoding changed to cp1251 and I have many records with values like Внедорожник 5 дв.

I have a solution - just to go throw all items, find and modify all string fields:

...
mc2.findOneAndUpdate(
    new Document("canonical", canonical), 
    new Document("$set", 
        new Document("regionName",
           new String(doc.getString("regionName").getBytes("cp1251"), "UTF-8"))));
...

Is there a way to do that without external program? (with some mongo js functions, utilities etc.)

Ivan Ivanov
  • 2,076
  • 16
  • 33
  • 2
    It "should" be stored UTF-8. Are you **sure** this is not a "console output" issue? See [BSON Spec](http://bsonspec.org/spec.html) – Neil Lunn May 31 '17 at 07:18
  • @NeilLunn It is utf-8 in mongodb, but Java put there not-utf8. So when I retrieve data in Java in console everything is fine, but in all other programs (python, node, robomongo) I see thrash. When I change encoding (see code snippet) in all programs all data looks fine (except java console, of course, if I don't change encoding in java). – Ivan Ivanov May 31 '17 at 07:21
  • Well if you are seeing a difference then you are seeing a difference. *"Can you issue a command to the server to update internally?"* I guess is the gist of the question then. No you cannot really, except for `eval()` and you really should tread carefully with that one. And of course that would require being able to do the codepage conversion in JavaScript, so good luck with that! So, *maybe* possible, but certainly safer to continue doing what you are doing. – Neil Lunn May 31 '17 at 07:27

0 Answers0