0

I have a collection of a similar look:

_id:5d0fe0dcfd8ea94eb4633222
Category:"Stripveiling (Nederlands)"
Category url:"https://www.catawiki.nl/a/11-stripveiling-nederlands"
Lot title:"Erwin Sels (Ersel) - Originele pagina"
Seller name:"Stripwereld"
Seller country:"Nederland"
Bids count:21
Winning bid:"€ 135"
Bid amount:"Closed"
Lot image:"https://assets.catawiki.nl/assets/2011/11/17/7/4/c/74c53540-f390-012e-..."

I need to change the "Winning bid" field to a int. That is, remove the currency sign and convert from string to int for the entire collection.

Nowhere in the documentation I could not find how to do it, do I really have to take every value with Python, remove the currency symbol and use the method update to do it? I have almost 8,000,000 records, it will be long.

How can I do this with the collection method? Or what is the quickest option to do this with Python?

kshnkvn
  • 876
  • 2
  • 18
  • 31
  • are all the winning bids of the form "currency number"? – Abdeslem SMAHI Jun 28 '19 at 21:00
  • @AbdeslemSMAHI yes – kshnkvn Jun 28 '19 at 21:11
  • int(currency_string[1:].strip()) - this is assuming that there is a string in the first two characters are the currency symbol and the ' ' space character. – MFK34 Jun 29 '19 at 01:20
  • @MFK34 I have no problems with converting the string to a int with a python, especially since the strip() is not needed, you can just int(). I do not know how to carry out all the updates in mongodb – kshnkvn Jun 29 '19 at 06:44

1 Answers1

1

If you want to convert the entire collection, you can do it with Aggregation pipeline.

You need to convert the currency to string using $substr and $toInt( or $toDouble, or $convert whatever suits your case) in the $project stage and $out as your last stage of aggregation. $out writes the result of the aggregtion pipeline to the given collection name.

But be careful while using $out. According to official mongodb documentation :

Create New Collection

The $out operation creates a new collection in the current database if one does not already exist. The collection is not visible until the aggregation completes. If the aggregation fails, MongoDB does not create the collection.

Replace Existing Collection

If the collection specified by the $out operation already exists, then upon completion of the aggregation, the $out stage atomically replaces the existing collection with the new results collection. Specifically, the $out operation:

  1. Creates a temp collection.
  2. Copies the indexes from the existing collection to the temp collection.
  3. Inserts the documents into the temp collection.
  4. Calls db.collection.renameCollection with dropTarget: true to rename the temp collection to the destination collection.

The $out operation does not change any indexes that existed on the previous collection. If the aggregation fails, the $out operation makes no changes to the pre-existing collection.

Try this :

db.collection_name.aggregate([
    {
        $project: {
            category : "$category",
            category_name : "$category_name",
            lot_title : "$lot_title",
            seller_name : "$seller_name",
            seller_country : "$seller_country",
            bid_count : "$bid_count",
            winning_bid : { $toInt : {$substr : ["$winning_bid",2,-1]}},
            bid_amount : "$bid_amount",
            lot_image : "$lot_image"
        }
    },{
        $out : "collection_name"
    }
])

you might need to use allowDiskUse : true as an option to aggregation pipeline, as you have a lots of documents, and it may surpass 16MB mongodb limit.

Don't forget to replace collection_name with actual collection name , and include all the required field in the $project stage which you need in the collection. And please double check the value first either with a different temporary_collection or just by removing the $out stage and checking the result of aggregation pipeline.

For detailed information read official mongodb documentation $out, $toInt, $toDouble, $convert, $substr and allowDiskUse.

Ravi Shankar Bharti
  • 8,922
  • 5
  • 28
  • 52
  • Hello. It does not work for me, because my version of mongodb is 3.6.4. This feature appeared in version 4.0. – kshnkvn Jul 03 '19 at 08:52
  • I think it was available from MongoDB version 2.6, later changed in version 3.2(with additional features), so it should work for you. You can confirm from official $out documentation as well : https://docs.mongodb.com/manual/reference/operator/aggregation/out/ – Ravi Shankar Bharti Jul 03 '19 at 09:26
  • i mean $toInt and $convert is new in version 4.0. u can check it. – kshnkvn Jul 03 '19 at 09:47
  • for now im backing up my db and after restore into new verion, after that im try u code. – kshnkvn Jul 03 '19 at 09:48
  • oh okay, you were talking about `$toInt` and `$convert`. I see – Ravi Shankar Bharti Jul 03 '19 at 09:53
  • Oh, I completely forgot. I have some cells where instead of a number there is "No bids" and now I have the following error:Failed to parse number ' bids' in $convert with no onError value: Bad digit " " while parsing bids – kshnkvn Jul 03 '19 at 10:09
  • can i get around this somehow? – kshnkvn Jul 03 '19 at 10:10
  • maybe I should replace the line "No bids" with "€ 0" and after that use ur code? – kshnkvn Jul 03 '19 at 10:11
  • yes do that, replace code is pretty easy, `.update({winning_bid : "No bins"},{$set : {winning_bid : "€ 0"}},{multi:true})`. Dont forget `multi:true`, to update all the docs, otherwise it will only update the first matched doc. – Ravi Shankar Bharti Jul 03 '19 at 10:12
  • can i somehow use this command in aggregation? I easily redid your past commands: I simply inserted the “body” of the function into the aggregation $project and aggregation $out, but with the update I can not find the desired aggregation – kshnkvn Jul 03 '19 at 10:41
  • why dont you just run this update command before performing the aggregation query? – Ravi Shankar Bharti Jul 03 '19 at 10:42
  • I completely screwed up with the database. I called her "catawiki lots" with a space and now i cant connect to it anyway – kshnkvn Jul 03 '19 at 10:47
  • only what i can its use Compass and aggregations – kshnkvn Jul 03 '19 at 10:48
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/195921/discussion-between-ravi-shankar-bharti-and-kshnkvn). – Ravi Shankar Bharti Jul 03 '19 at 10:49
  • oh wait i found solution: db["catawiki lots"].renameCollection("catawiki_lots"); – kshnkvn Jul 03 '19 at 10:50