0

I am building a custom import tool and wondering what would be the best practice of importing a large amount of data. I have the following JSON data structure with a minimum of 500 products across 30 days per import.

"rows": [{
        "product_uid": "k110",
        "sale_date": "2018-06-06",
        "amount": 15
    }, {
        "product_uid": "k111",
        "sale_date": "2018-06-06",
        "amount": 22
    }, {
        "product_uid": "k110",
        "sale_date": "2018-06-07",
        "amount": 30
    }
]

The schema for the table as follows:

daily_sales_id - product_uid - sale_date - amount

I am using nodejs mysql to execute multiple SQL statements in a single connection. It works well for inserting the rows at the first time but in subsequent tries, it will insert duplicate rows. I can truncate the table before inserting but this will fail if the user decide to import a delta snapshot instead of the entire records.

While I can do a for-loop to check if record exist and do an update instead of insert, looping through 15,000+ records and creating 15,000+ select connections doesn't seem to be a good idea.

Is there any other alternatives where I can keep the data structure and perform an update/insert without looping through 15,000+ records? The import csv file doesn't know the daily_sales_id.

Jun
  • 113
  • 1
  • 2
  • 7
  • You could try putting a unique key on the columns which define a record as being a "duplicate." Then, inserts which would add a duplicate would fail at the database level. Another option would be to have a periodic job which removes duplicates manually from your table. I like the first option more. – Tim Biegeleisen Jun 06 '18 at 06:37
  • Hi Tim, the "unique-ness" of each record is the combination of the product_uid and sale_date field. How should I put a unique key on 2 columns in mysql? – Jun Jun 06 '18 at 06:40

1 Answers1

0

One option here would be to add a unique index on the columns which define a record in your table as being duplicate, something like this:

CREATE UNIQUE INDEX your_idx ON yourTable(product_uid, sale_date);

The net result of this is that an insert which attempted to add a new record with a product_uid/sale_date combination which already existed in the table would fail at the database level. You of course would need some Node.js code to handle this, but that should not be very difficult.

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360