3

I need to import about 150,000 products with categories, manufacturers, images, and attributes into Nop Commerce. I wrote a plugin based on Nop.Services.ExportImport.ImportManager. The more products that are imported, the slower the import process runs. The first 100 products will import in 1.5 minutes. By about 2500 products it is taking over 15 to 20 minutes to import another 100 products. Importing more than 2500 products at a time causes Out of Memory exceptions. Things even run slower after stopping at 2500 and then doing a second import.

Here is the process the plugin uses:

  1. Create a new product entity, populate the details, then insert the product using Nop.Services.Catalog.ProductService.InsertProduct. It is necessary to use the ProductService because we need the value of Product.ID for adding child objects to the product.

  2. Add product manufacturer by first searching if the manufacturer exists. If the manufacturer is not found, then create the manufacturer and add using ManufacturerService. Then add a ProductManufactuer entity to assign the product to the manufacturer. This entity is added to a list object.

  3. Do the same routine for Categories, Specification Attributes, URL Records, and Product Tags.

  4. Download the first image and add using PictureService. PictureService inserts a Picture entity to the database and then saves the image to the images directory. Then a ProductPicture entity is created and added to a list.

  5. Every 100 products, I use SQLBulkCopy/EFBulkInsert to insert the lists of ProductManufacturers, ProductCategories, ProductImages, etc to the database. Then I clear the lists and start again.

So through the whole process we are using Entity Framework to add Products, Categories, Manufacturers, Product Tags, and Specification Attributes to the database one by one. This is necessary because we need to get the ID's of those items so we can assign them to a product by inserting records into the mapping tables. The records that get added to the mapping tables can get added to a list and then bulk inserted to the database using SQLBulkCopy/EFBulkInsert.

Every time a product and its properties are added the whole process goes slower and slower. It also seems to use more and more memory as time goes on. I'm not sure what to do next.

I'm not sure if Entity Framework is causing the problem or not. I've read that Entity Framework tends to use more and more memory as time goes on and that it is a good idea to reset your DBContext every so often. The problem with that is Nop manages the DBContext through it's EFRepository Layer in Nop.Data.

Using EFBulkInsert/SQLBulkCopy to add the product property mapping records to the database seems to speed things up a bit. I thought about maybe going through and using that to Insert all the products first, then going back through a second time and importing the properties. I will still run into the problem where I can't bulk import the Picture entities to the database because I need the PictureID plus the ProductID to associate the picture with the product.

I've also thought about using ADO.NET and stored procedures to do the one by one inserts but am not really sure if that would improve things. Does anyone have any suggestions?

David
  • 1,560
  • 2
  • 16
  • 32

2 Answers2

2

I'm the author of EFUtilities https://github.com/MikaelEliasson/EntityFramework.Utilities that does bulk inserts among other things. For the next version it will be able to return store generated id's from the bulk insert.

Right now this is working but not released. I expect a release during this or next weekend. Meanwhile you can find the code for it here https://github.com/MikaelEliasson/EntityFramework.Utilities/blob/release20/EntityFramework.Utilities/EntityFramework.Utilities/SqlQueryProvider.cs#L56 and if you download and build the release20 branch you should be able to use it.

Here is the test that shows how to enable Id return (it's a bit slower so disabled by default) https://github.com/MikaelEliasson/EntityFramework.Utilities/blob/release20/EntityFramework.Utilities/Tests/InsertTests.cs#L125

using (var db = Context.Sql())
{
     EFBatchOperation.For(db, db.BlogPosts).InsertAll(list, new BulkSettings
            {
                ReturnIdsOnInsert = true,
            });
}
Mikael Eliasson
  • 5,157
  • 23
  • 27
1

I have gone through you what you are facing now. But we had about 30K products and images. I did try EF, SSIS and Plain SQL Scripts.

My preferences would be,

  1. Plain SQL Scripts.
  2. SSIS.
  3. Entity Framework import.

I used to write plain SQL script as one file which Restores a copy of blank NopCommerce DB, Reads data from source Database and populates tables.

Using Plain SQL,

  1. Its fast, but hard to develop at first place. you need to build data import table by table.
  2. easy to debug using print commands.
  3. If you find any bugs after import, you can fix and rerun quickly.

Down side of this is, you should know all the relationship of the tables in nopcommerce. based on your question, i can see you have good understanding of it.

Using SSIS,

  1. Its fast.
  2. Quick to Develop.
  3. If you find any bugs after import, you can fix and rerun quickly.

Using EF,

Your question says all.

Now, if you have a situation where you may need to import data in a periodical time, then write stored procedures and call the stored procedures to import using ADO.NET. It would be the fastest way to import/update. Once every import is done, don't forget to call clear cache in nop admin.

Source:I am working in a nopcommerce based eCommerce platform for over 2 years.

Jeyara
  • 2,198
  • 1
  • 23
  • 26