2

I have been going through various sites and codes, but nothing seems to end my misery. Either they help to find and remove duplicates for a specific column or they remove only from the datatable, not the underlying database itself. I want to delete duplicate rows from table "table1" of my mdb file.

To make my requirements clearer:

  1. there's no primary key set for any column in the table (and I cant afford to)
  2. I want to delete all duplicate rows but one! (order has no significance)
  3. I prefer deleting the duplicates from database more than first checking if such a row exist or not before updating database itself (if that's the last resort, which cant be, then that's welcome)
  4. by duplicate rows I mean rows that are not distinct. for eg, in the following example,only 3rd and 5th row are duplicates. And I want to delete any of them.

          Name1  Name2    Name3
          tom    dick   harry
          tom    dick   mike
          ann    sara   mike
          sara   ann    mike
          ann    sara   mike
    

The duplicate rows should be deleted from database with a button click as follows

     private void button1_Click(object sender, EventArgs e)
     {
         deletedupes();
     }

    private void deletedupes()
    {
        OleDbConnection con = new OleDbConnection("PROVIDER=Microsoft.Jet.OLEDB.4.0; Data Source=C:\\hi.mdb");

        DataSet ds = new DataSet();
        OleDbDataAdapter da = new OleDbDataAdapter("select * from table1", con);
        con.Open();
        da.Fill(ds, "table1");

        // what could be rest of the code??
    }

Thanks in advance. Yes I'm a novice..

nawfal
  • 70,104
  • 56
  • 326
  • 368
  • Some database engines allows you to do something similar to "delete top 1 from ...". Perhaps Access supports that? Oh, and **add a primary key**. – Lasse V. Karlsen Apr 08 '11 at 19:57
  • Why can't you have a primary key? How do you get duplicate rows in the first place? – Lasse V. Karlsen Apr 08 '11 at 20:00
  • @Lasse, sorry I dunno whats top1 from. I cant have primary key for my columns like Name1,Name2 or Name3.Since that means only unique value for those columns. I can have a unique ID as primary key, but what purpose that would serve here? If I set my column0 as primary key, then which way can I detect duplicate rows, rows which has same value for column1,column2 and column3? – nawfal Apr 08 '11 at 20:11
  • can anyone provide the code to do what i want?I cant get this done reading online.. – nawfal Apr 08 '11 at 21:07
  • @nawfal: I can't quite get why you can't have a primary key. You say that it would not allow duplicates, but you seem to not want duplicates either. You realize that you can have a primary key made of multiple columns (so-called "composite" primary key), right? – Alek Davis Apr 08 '11 at 23:32
  • @Alek, Yes I want duplicates in columns Name1,Name2 and Name3. You can see them in my example provided. I meant I cant set Name1 column as primary column since I want duplicates there. My requirement is that I shouldnt have a row with same value for Name1,2 and 3!. Otherwise there is no problem having a column of unique ID as column 0. I think I clarified that in my comment above yours. Anyway I got it working with the answer I provided. Yes I have used a primary column there, but can be done without primary key as well using the same logic. Just a matter of choice. – nawfal Apr 09 '11 at 01:40
  • If your "requirement is that [you] shouldnt have a row with same value for Name1,2 and 3", then just define your primary key as a composite key that is made of all three columns (as I said, the key can be made of multiple columns). See Composite Keys section at http://office.microsoft.com/en-us/access-help/create-or-remove-a-primary-key-HA010014099.aspx – Alek Davis Apr 09 '11 at 05:35
  • @Alek, a big thanks for that tip. I never knew anything like that. Surely gonna help me in future. thankss.. – nawfal Apr 12 '11 at 07:42

5 Answers5

3

If you haven't realized it already, database engines tends to think in absolutes. If you want it to delete a row, you have to tell it how to identify that row. Thus, primary keys.

Having said that, there is generally, but not always, two (2) ways you can do this:

  1. Find out if Access supports syntax to tell DELETE to only consider the "first N rows", similar to DELETE TOP 1 FROM ...
  2. Grab a distinct dataset from your table, delete all the rows in it, and insert the distinct rows back into it

The first might be possible, but it depends on whether Access supports any syntax that makes it possible. For instance. Microsoft SQL Server supports executing a statement SET ROWCOUNT 1 before a DELETE, and then DELETE will delete only 1 row, and then stop. I don't know if Access will do that.

The second will be a pain if you have foreign keys, but I'm going to go out on a limb here and assume that since you don't have primary keys, you don't have foreign keys, so data integrity is not a real problem here.

Lasse V. Karlsen
  • 380,855
  • 102
  • 628
  • 825
2

Here is an article discussing several approaches for deleting duplicate rows in SQL Server, but I suspect it would apply to MS Access, as well: Removing Duplicates from a Table in SQL Server

Alek Davis
  • 10,628
  • 2
  • 41
  • 53
2

Ok, this is a complete hack, but it sounds like that's your only option...

Do a SELECT DISTINCTROW from your table. Delete all records from your table. Insert the distinct rows back in.

DISTINCTROW Syntax.

BrandonZeider
  • 8,014
  • 2
  • 23
  • 20
1

I have a similar problem were the values on rows are indentical but should keep only 1 row per combination of 2 columns. I was thinking about COUNT() and GROUP BY with HAVING COUNT() > 1 to get the combinations of these columns that have more than one occurence in a table. Then using recodset object from DAO to get rows, skip first and delete the rest. This is slow and cumbersome but works without adding a primary key.

0

As none of the answers were satsifactory for me (I'm just a tad too novice to understand the succint and slightly technicalized way spoken here by more knowledgable and experienced people), i tried my own variant to get this done. I could not follow what to be done with commands like distinct or set rowcount or delete from etc. Nowhere I could find a fully deployed code in an example. So I tried this. From scratch.

    int id, k;
    private void button2_Click(object sender, EventArgs e)
    {
        OleDbConnection con = new OleDbConnection("PROVIDER=Microsoft.Jet.OLEDB.4.0; Data Source=C:\\hi.mdb");

        DataSet ds = new DataSet();

        OleDbDataAdapter da = new OleDbDataAdapter("select * from table2", con);
        con.Open();
        da.Fill(ds, "table2");


        for (int i = 0; i < ds.Tables["table2"].Rows.Count; i++)
        {
            DataRow row = ds.Tables["table2"].Rows[i];
            k++;
            for (int j = k; j < ds.Tables["table2"].Rows.Count; j++)
            {
                DataRow row2 = ds.Tables["table2"].Rows[j];
                if (row.ItemArray.GetValue(1).ToString() == row2.ItemArray.GetValue(1).ToString())
                {
                    if (row.ItemArray.GetValue(3).ToString() == row2.ItemArray.GetValue(3).ToString())
                    {
                        id = int.Parse(row2.ItemArray.GetValue(0).ToString());
                        deletedupes(id);
                    }
                }
            }
        }

        con.Close();
    }


    private void deletedupes(int num)
    {
        OleDbConnection con = new OleDbConnection("PROVIDER=Microsoft.Jet.OLEDB.4.0; Data Source=C:\\hi.mdb");

        con.Open();

        OleDbCommand c = new OleDbCommand("Delete from table2 where id =?", con);
        c.Parameters.AddWithValue("id", num);
        c.ExecuteNonQuery();

        con.Close();
    }

Edit: Sorry, I missed to say that I did use a unique column having a primary key to get this done. Nevertheless, this can be done without that as well. Just a matter of choice. And for unknown reasons, this method seems so fast too..

nawfal
  • 70,104
  • 56
  • 326
  • 368
  • This sample may indeed do what you need to do, but this is not something you would want to show to a potential employer or actually release in real life. – Alek Davis Apr 09 '11 at 05:38