0

I run a bulk insert cron job everyday. But some values get missed and when I rerun the data, the values are added to the existing data rather than updating. Is there a way to do an insert only documents that have not yet been inserted.

My code:

query = bigQuery.get_data(query)
bulk = col.initialize_unordered_bulk_op()

for i, row in enumerate(query):
    bulk.insert({
        'date': str(row['day_dt']),
        'dt': datetime.strptime(str(row['day_dt']), '%Y-%m-%d'),
        'site': row['site_nm'],
        'val_counts': row[8]
    })

bulk_result = bulk.execute()

Right now, it re-inserts all the values every time the query runs. Is there a way to only add values that have not yet been added.

nb_nb_nb
  • 1,243
  • 11
  • 36
  • You should first check if the record exists, and if not, insert it. – securisec Sep 27 '19 at 18:20
  • @securisec, I am very new to this. How do I do that? – nb_nb_nb Sep 27 '19 at 18:21
  • You can use [findOne](https://api.mongodb.com/python/current/api/pymongo/collection.html#pymongo.collection.Collection.find_one). So `col.find_one(query)` where `query` is something you know about the data, and/or the `ObjectID` – securisec Sep 27 '19 at 18:26

1 Answers1

0

I obviously dont fully know your data structure, and not fully clear on what you are trying to do, but I think this should do.

query = bigQuery.get_data(query)

new_things = []
for i, row in enumerate(query):
    if not col.find_one(your_query): # make sure that the document does not exist already
        # add data to an array
        new_things.append({
        'date': str(row['day_dt']),
        'dt': datetime.strptime(str(row['day_dt']), '%Y-%m-%d'),
        'site': row['site_nm'],
        'val_counts': row[8]
    })

# use insert_many to insert all the documents
bulk_result = col.insert_many(newthings)

Check the comments next to the code for explanation. If you are a noob as you mentioned, i would stick to the simpler way of doing things and scale your code as your experience grows.

securisec
  • 3,435
  • 6
  • 36
  • 63