0

I read an Excel file and insert that data into a database table, but every time I do this, it adds the existing rows plus the new data, I only want to insert the rows that aren't already in the table, my unique ID is the current time stamp.

For example, this is what happens currently when I do the first insert:

ExcelFile                           Database Table

a | b | date                        a | b | date
-----------                        ---------------
1 | 1 | 2018/02/12                  1 | 1 | 2018/02/12  
2 | 2 | 2018 /03/12                 2 | 2 | 2018 /03/12 

This happens when I do the second insert:

ExcelFile                           Database Table

a | b | date                        a | b | date
-----------                        ---------------
1 | 1 | 2018/02/12                  1 | 1 | 2018/02/12  
2 | 2 | 2018 /03/12                 2 | 2 | 2018 /03/12 
3 | 3 | 2018 /04/12                 1 | 1 | 2018/02/12
                                    2 | 2 | 2018 /03/12 
                                    3 | 3 | 2018 /04/12

I use Entity Framework to perform this and the ExcelDataReader package:

var result = reader.AsDataSet();

DataTable dt = new DataTable();
dt = result.Tables[0];      // here I store the data from the Excel file

foreach (DataRow row in dt.Rows)
{
    using (AppContext context = new AppContext())
    {
        Data data = new Data();
        string date = row.ItemArray[4].ToString();
        DateTime parseDate = DateTime.Parse(date);
        Data datos = new Data
                            {
                                a = row.ItemArray[0].ToString(),
                                b = row.ItemArray[1].ToString(),
                                c = row.ItemArray[2].ToString(),
                                d = row.ItemArray[3].ToString(),
                                e = parseDate
                            };
        context.Data.Add(datos);
        context.SaveChanges();
    }
}

Is there a way to filter the excel file or compare them?

I'm all ears.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
HiramL.
  • 3
  • 5
  • 2
    Is there a reason why you are using the `Date` as the unique Id? – Jimenemex Nov 13 '18 at 21:54
  • Do you have an option to update the Excel workbook and mark values that have been processed by adding a value in another column? That would be ideal. Otherwise you would need to check every single value against the database every time. Even better than marking them, perhaps move them to a different sheet. Or even delete them. – Scott Hannen Nov 14 '18 at 01:38
  • @Jimenemex yes, the excel file is autogenerated and the only value that is unique for every row is the date, and also I can't touch that original excel file, I had to create a method to perform a copy every few minutes and work with that copy. – HiramL. Nov 14 '18 at 02:59
  • @mjwills on the excel file the unique PK could be the Date and Hour, that is the only cell different for every row, on the other hand, my DB table do has a PK :) – HiramL. Nov 14 '18 at 03:02
  • @ScottHannen that sound like a lot of extra steps, we had this Idea but was rejected because of the app performance, anyway this was simply resolved by MikeH storing the values of the date and checking for an existing row with the same Datetime – HiramL. Nov 14 '18 at 03:06
  • @mjwills On the DB Table I created another column named IdData(Identity Increment) so I dont have to insert nothing to that column from the app. – HiramL. Nov 14 '18 at 16:13

3 Answers3

0

Check for an existing row before adding it. The below should be inserted below where you calculate parseDate.

var existingRow = context.Data.FirstOrDefault(d=>d.e == parseDate); //Note that the ".e" should refer to your "date" field
if (existingRow != null)
{
  //This row already exists
}
else
{
  //It doesn't exist, go ahead and add it
}
MikeH
  • 4,242
  • 1
  • 17
  • 32
  • thanks man, this did the thing correctly and its very good in performance. I was trying to create a Stored procedure that could merge the excel datatable and the table in my Database but seemed kind of complicated since I am new to EF. Greetings Mike! – HiramL. Nov 14 '18 at 03:38
0

If "a" is the PK on the table and unique across rows then I would check the existence of existing rows by ID before inserting. Similar to Mike's response, though one consideration is if the table has a number of columns I would avoid returning the entity, but rather just an exists check using .Any()

if (!context.Data.Any(x => x.a == row.a)
  // insert the row as a new entity

The caveat here is if the excel file contains edits, existing rows where the data changes, this will not accommodate that.

For bulk import processes, I would typically approach these by first staging the excel data into a staging table first. (purging the staging table prior to each import) From there I would have entities mapped to staging tables, vs entities mapped to the "real" tables. If there is a "modified date" that can be extracted from the file for each record then I would also store the import date/time as a part of the application, so that when selecting the rows to import from the staging table, only get records where that modified date/time > the last import date/time. From there you can query data from the staging table in batches, and look for new records vs. existing modifications. I find querying entities on both side of the migration is more flexible than dealing with an in-memory block for the import. With small imports it may not be worthwhile, but for larger files where you will want to work with smaller sub-sets and filtering, it can make things easier.

Steve Py
  • 26,149
  • 3
  • 25
  • 43
0

I could perform exactly what I needed with the help of @MikeH With this only the rows with different DateTime were added (the DateTime its always an ascending value in my case.)

foreach (DataRow row in dt.Rows) // dt = my dataTable loaded with ExcelDataReader
                    {
                        using (AppContext context = new AppContext())
                        {
                            string date = row.ItemArray[4].ToString(); 
                            DateTime parseDate = DateTime.Parse(date); // I did a parse because the column "e" only accepted DateTime and not String types.

                            var existingRow = context.Data.FirstOrDefault(d => d.e == parseDate);
                            if (existingRow != null)
                            {
                                Console.WriteLine("Do Nothing");
                            }
                            else
                            {
                                Data datos = new Data
                                {
                                    a = row.ItemArray[0].ToString(),
                                    b = row.ItemArray[1].ToString(),
                                    c = row.ItemArray[2].ToString(),
                                    d = row.ItemArray[3].ToString(),
                                    e = parseDate
                                };
                                context.Data.Add(datos);
                                context.SaveChanges();
                            }
                        }
                    }
HiramL.
  • 3
  • 5