0

I have one table with almost 2 million records on my SQLite database, some records have null birthday and I would like to fix with data from antoher database since the ids are the same on both I can reference it for this update.

However I am having an issue for reading and I believe I will have antoher for updating it.

Reading issue, is there a way to get all the ids from the result without having to read 1 by 1 ?

Query being executed is (birthday is indexed):

SELECT id FROM personal WHERE birthday IS NULL OR birthday = ''

Currently I do this to retrieve all ids (which take something like 3 minutes for the first 300k records and then increase to something like 9 minutes for 250k records and keep growing the delay):

while (reader.Read())
{
    ids.Add(reader.GetInt32(0));
}

After I will reuse the ids list to execute an update to each record with the correct birthday which I assume I will run into the same issue, it will take a long time to get it done by inserting 1 by 1, which bring us to what I believe that will be the second issue.

Is there a fast way to update all the records with the needed data ?

NOTE: the birthdays I will read for updating the SQLITE will come from a remote MySQL server and as such I cannot link the 2 in 1 query like update select etc.

Table format is:

id
first_name
last_name
birthday
email
status

When I run the same query on SQLite Admin it takes 2371ms to spit all the data, so I assume I must be reading it wrong or reading 1 by 1 must be hurting it really bad.

Prix
  • 19,417
  • 15
  • 73
  • 132
  • This can be related to http://stackoverflow.com/questions/1437260/is-there-a-fast-way-to-update-many-records-in-sql – Sandeep Oct 18 '12 at 14:44
  • sqlite - Converter Tools might be helpful http://www.sqlite.org/cvstrac/wiki?p=ConverterTools – Amitd Oct 18 '12 at 14:57
  • This doesn't matter because of lboshuizen+Bader's answer, but the reason that reading the IDs becomes slower is that `ids` must be resized repeatedly. You could speed this up be doing `SELECT COUNT(*) ...` first and preallocting `ids` to that size. – CL. Oct 18 '12 at 15:56

4 Answers4

2

Execute this SQL on your database. No need to iterate over all the records. It does all you need in one action:

UPDATE
    Persons
SET
    Persons.birthday = OtherTable.birthday
FROM
    Persons
INNER JOIN
    OtherTable
ON 
    Persons.ID = OtherTable.ID
WHERE 
    Persons.birthday IS NULL OR Persons.birthday = ''
lboshuizen
  • 2,746
  • 17
  • 20
  • 1
    I cannot read both databases to do that, 1 is MySQL other is SQLite, so I need to read them separated in order to do update the SQLite one. – Prix Oct 18 '12 at 14:44
  • 3
    why you do not import data from MySQL table to SQLite, you can drop the table after updating. – PyQL Oct 18 '12 at 14:48
  • They are not the same in terms of table design, however they do have the id and birthday in comum. The 2 databases have different means of use and I was lucky that this column in particular is right for what I need. – Prix Oct 18 '12 at 14:57
  • They do not need to be the same. You will just need two columns: ID and birthday. Export these two into any format for example CSV, then import it into your SQLite DB. – PyQL Oct 18 '12 at 15:00
1

You could try using the DataAdapter.Fill method to read all data in a DataTable or a DataSet.

I'm not sure if this is going to be faster, but it's probably worth trying.

Assuming you have an DbCommand instance called cmd, the code could be:

var table = new DataTable();
var adapter = new SqlDataAdapter(cmd);
adapter.Fill(table);

(I think you have to use SqliteDataAdapter - I don't know exactly how it's called for SQLite)

After this, table should have one column and contain all the data you need.

Other things I would try, if this is still slow:

  • process the data in batches (for example 100 or 1000 items at a time, based on PK id)
  • try to do it without resorting to C#, if possible (for example, do a data import in MySQL, run an UPDATE with JOIN there and then bring the data back to SQLite)
Cristian Lupascu
  • 39,078
  • 16
  • 100
  • 137
  • For the update I don't know if it's possible to do it all at once. If you use ADO.NET, at low level I think it's still going to update record by record. – Cristian Lupascu Oct 18 '12 at 14:51
  • @Prix I recommend a solution that does not involve C# in this case if it's possible (see the last line in my answer) – Cristian Lupascu Oct 18 '12 at 14:52
  • I see, but I guess exporting the table out of sqlite into mysql doesnt have such a overhead or so I hope, I will try some things around including what you recommended and get back with my results later thanks a lot. – Prix Oct 18 '12 at 14:55
  • @Prix also - I think Bader's idea is great. See his comment on lboshuizen's answer. – Cristian Lupascu Oct 18 '12 at 14:57
  • Yeah I was just reading the other responses, however I will have to export from one of the 2 just the specific needed data in order to do that. – Prix Oct 18 '12 at 14:59
0

You can retrieve all the rows in a dataset in one call..

     using (SqlConnection cnn = new SqlConnection("connection_string_here"))  
    {
         SqlDataAdapter da = new SqlDataAdapter("SELECT id FROM personal 
WHERE birthday IS NULL OR   birthday = ''", cnn); 
         DataSet ds = new DataSet(); 
         da.Fill(ds, "personal"); 

         List<string> pids = new List<string>();
         foreach(DataRow row in ds.Tables["personal"].Rows)
         {
           pids.Add(row["id"].ToString());
           // similarly you can update row objects here.
         }
        }

Perform your updates in the dataset and then save the changes with dataset object.

http://support.microsoft.com/kb/301248

http://www.datadirect.com/resources/ado-net/sqlserver-code-examples/updating-dataset.html

Amitd
  • 4,769
  • 8
  • 56
  • 82
0

You could retrieve the list of IDs as a comma delimited string, using GROUP_CONCAT, then split it into an array of ints. Something like:

SELECT GROUP_CONCAT(id) FROM personal WHERE birthday IS NULL OR birthday = ''

Then in your code do:

var ids = myStringOfIDs.Split(',').Select(val => int.Parse(val));

For updating, one approach would be to do batch updates.

Garett
  • 16,632
  • 5
  • 55
  • 63