1

I'm trying to import a CSV file using mongoimport but it turns out that the import changes the data itself.

If you have a CSV file like this: (file.csv)

"SN","Description","OK"
"123456789012345","I should end up in 123456789012345","true"
"1234567890123456","I should end up in 1234567890123456","true"
"12345678901234567","I should end up in 12345678901234567","false"
"123456789012345678","I should end up in 123456789012345678","false"
"1234567890123456789","I should end up in 1234567890123456789","false"
"12345678901234567891","I should end up in 12345678901234567891","false"
"123456789012345678912","I should end up in 123456789012345678912","false"
"1234567890123456789123","I should end up in 1234567890123456789123","false"

And you execute the following commmand:

mongoimport -h XXXXXXX --port=XXXXXXX -u XXXXXXX -p XXXXXXX -vvv --db XXXXXXX --collection XXXXXXX --headerline --type csv --file /path/to/file.csv

You'll end up with strange things in your mongodb database. For instance (a few results shown here):

...
{
    _id: ObjectId("56ad7a292e47ad18eb25a405"),
    SN: 12345678901234568,
    Description: "I should end up in 12345678901234567",
    OK: "false"
}, {
    _id: ObjectId("56ad7a292e47ad18eb25a406"),
    SN: 123456789012345680,
    Description: "I should end up in 123456789012345678",
    OK: "false"
}, {
    _id: ObjectId("56ad7a292e47ad18eb25a407"),
    SN: 123456789012345680000,
    Description: "I should end up in 123456789012345678912",
    OK: "false"
}, {
    _id: ObjectId("56ad7a292e47ad18eb25a40b"),
    SN: 1.2345678901234568e+21,
    Description: "I should end up in 1234567890123456789123",
    OK: "false"
}
...

The last number to work was 1234567890123456

So it seems that strings with big numbers on it are transformed into numbers. The problem is the way this is handled, causing unexpected results.

I tried using the following versions of mongoimport:

  1. mongoimport version: 3.0.5 git version: 9da01528ee677e1790bb0b506c816ca9fbe0a6a8

  2. version 2.6.12-pre- (commit b9894192b989d40acdb49aebcb9e64ddf67db1e1)

  3. mongoimport version: 3.2.0-rc5 git version: 6186100ad0500c122a56f0a0e28ce1227ca4fc88

I don't usually use mongoimport but this was supposed to be the easiest way to import some mysql CSV dumps, and it definitely wasn't.

nico
  • 144
  • 12

1 Answers1

0

Yes the problem is with long number getting trimmed to maximum integer value.

Though, you cannot convert the data type of a number while using mongoexport. One of the solution can be to write a small script in backend script(using python/php etc) or a javascript file that reads the data from mongo, process it in desired format converting expected numbers into string

Example :

mongoData  = db.collection_name.find(mongoQuery)
for(var row in mongoData)
{
     print "n"+row["long_field"].toString()+","+row["other_field1"]+","+...

}

I have intentionally added n at beginning of the long number as if you try to open it in MS excel or any other tool it will again get trimmed.

Then you can run this on console as mongo mongoscript.js > /tmp/a.csv

Other related sources:

mongoimport choosing field type

Mongoimport csv files with string _id and upsert

Community
  • 1
  • 1
abhinsit
  • 3,214
  • 4
  • 21
  • 26
  • Ok but still the way the number is being trimmed is very strange. I have just realised it might be doing some sort of rounding, otherwise why would xxx45678 would end up being xxx45680? probably it becomes xxx456.78? – nico Jan 31 '16 at 20:29
  • But if you try to insert `123456789012345678912` the database will store a bigger number `12345678901234568000` and that is not exactly truncating nor considering the 32/64 least significant bits. Looks like is doing some sort of rounding combined with truncating. – nico Feb 03 '16 at 06:40
  • Its not happening at database level. It is happening when DB is sending data to o/p for printing and getting trimmed – abhinsit Feb 05 '16 at 12:52