I run a bulk insert cron job everyday. But some values get missed and when I rerun the data, the values are added to the existing data rather than updating. Is there a way to do an insert only documents that have not yet been inserted.
My code:
query = bigQuery.get_data(query)
bulk = col.initialize_unordered_bulk_op()
for i, row in enumerate(query):
bulk.insert({
'date': str(row['day_dt']),
'dt': datetime.strptime(str(row['day_dt']), '%Y-%m-%d'),
'site': row['site_nm'],
'val_counts': row[8]
})
bulk_result = bulk.execute()
Right now, it re-inserts all the values every time the query runs. Is there a way to only add values that have not yet been added.