3

I have datatable "users" and column "is_published" in it. I have about 100k rows. What is the fastest way to update value in the column, so the whole rows in column have same value = 1.

I try with classic foreach loop and it't slow, also I try with LINQ :

dsData.Tables["users"].Select().ToList().ForEach(x => x["is_published"] = 1;);

and it still isn't fast enough.

Also variant wit Expression doesn't work for me, because after that fields is ReadOnly and I can't change value again.

This is C#.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Broken
  • 31
  • 1
  • 4
  • 4
    why not just run a sql statement? update users set is_published = 1 – Gavin Apr 27 '14 at 12:37
  • Linq.ForEach() is much slower ;) – sihirbazzz Apr 27 '14 at 12:38
  • I can't use update users set is_published = 1, because this is C# – Broken Apr 27 '14 at 12:40
  • 5
    just use ado.net - sqlconnection + sqlcommand object to run sql direct against your db. see http://msdn.microsoft.com/en-us/library/system.data.sqlclient.sqlcommand.aspx?cs-save-lang=1&cs-lang=csharp#code-snippet-2 – Gavin Apr 27 '14 at 12:42
  • 2
    you want to do the same operation on 100k+ rows and you somehow don't want to use a loop? Any solution you get from anyone is, at some point, going to boil down to a fancy way to write a loop... – Michael Edenfield Apr 27 '14 at 12:56
  • Well, problem is that this table is temporary, it exists only in my dataset, I don't have it in my database. – Broken Apr 27 '14 at 12:57
  • Try Parallel.ForEach http://msdn.microsoft.com/en-us/library/system.threading.tasks.parallel.foreach(v=vs.110).aspx – cheedep Apr 27 '14 at 13:34
  • How would you measure `fast enough`? How frequently do you need to do this? – ClickRick Apr 27 '14 at 13:50
  • I use this update for next thing: when you press the button SelectAll all rows in column "is_published" should be set to value 1, when you click button DeselectAll all rows should be set to value 0. It takes me about 40sec, and it is to slow for me. – Broken Apr 27 '14 at 14:02
  • 1
    Separate the *select* and *update* into to operations. Skip the `ToList()` operation and instead iterate afterwards over the `IEnumerable` collection using `forEach´ and update the value. *40* seconds is an awful lot for 100K items. IMO if your `DataTable` is bound to a DataGridView or some other control, the update of the GUI is taking so long and not the update of the values itself. Try suspending the UI bevor updating and refreshing the values and then enable it again (example in [this SO post](http://stackoverflow.com/questions/5817632/beginupdate-endupdate-for-datagridview-request)). – keenthinker Apr 27 '14 at 16:43
  • Update you database column "is_published" of table "user" with value 1 using any database Manager software. It will update your column very fast rather than using any loop or other codes. – Ravi Patel Apr 29 '14 at 07:09

3 Answers3

1

when you create your table you can simply push a default value to your column..

 DataTable dt = new DataTable();
dt.Columns["is_published"].DataType = System.Int32;
dt.Columns["is_Published"].DefaultValue = 1;

then when you need to change the rows to default value ( or will you need? )

// Say your user selects the row which its index is 2..
// The ItemArray gives the selectedRow's cells as object..
// And say your columns index no is 5..
dt.Rows[2].ItemArray[5] = default ;

or

dt.Rows[2].ItemArray[5] = dt.Columns["is_published"].DefaultValue;
sihirbazzz
  • 708
  • 4
  • 20
0

Separate the select and the update into two operations. Skip the ToList() operation and instead iterate afterwards over the IEnumerable collection using forEach and update the value:

var rows = dsData.Tables["users"].Select();
forEach(var row in rows)
{
    row["is_published"] = 1;
}

The ToList forces an immediate query evaluation which in this case acts as a copy of all items from the IEnumerable collection, so you can gain some speed here. I ran some tests and the result in this case is (using your code and the modification): ToList is 3 times slower than iterating over IEnumerable and single update!

IMO 40 seconds is an awful lot for 100K items. If your DataTable is bound to a DataGridView or some other UI control, i believe that the update of the GUI is taking so long and not the update of the values itself. In my tests the update using ToList took fractions of a second (on my simple Lenovo netbook with AMD E-450 processor, and i assume you are not using a 386 machine). Try suspending the UI bevor updating and refreshing the values and then enable it again - example in this SO post.


My original post (as i can see you gained some speed using the code - interesting):

More an experiment for my part, but it is possible to:

  • convert the table to XML
  • fetch all elements that should be changed
  • change them
  • write the changed XML back to the table

The code:

// temp table
var dataTable = new DataTable("Table 1");
dataTable.Columns.Add("title", typeof(string));
dataTable.Columns.Add("number", typeof(int));
dataTable.Columns.Add("subnum1", typeof(int));
dataTable.Columns.Add("subnum2", typeof(int));
// add temp data
Enumerable.Range(1, 100000).ToList().ForEach(e =>
{
    dataTable.Rows.Add(new object[] { "A", 1, 2, 3 });
});
// "bulk update"!
var sb = new StringBuilder();
var xmlWriter = XmlWriter.Create(sb);
dataTable.WriteXml(xmlWriter);
var xml = XDocument.Parse(sb.ToString());
// take column to change
var elementsToChange = xml.Descendants("title").ToList();
// the list is referenced to the XML, so the XML is changed too!
elementsToChange.ForEach(e => e.Value = "Z");
// clear current table
dataTable.Clear();
// write changed data back to table
dataTable.ReadXml(xml.CreateReader());

The table is updated. IMO the parts that make this solution slow are the

  • convertion from and to XML
  • and the fill of the StringBuilder

The other way around the pure update of the list is probably faster than the table update.

Community
  • 1
  • 1
keenthinker
  • 7,645
  • 2
  • 35
  • 45
0

Finaly! I speed up update so it takes 2-3 sec. I added BeginLoadData() and EndLoadData()

    DataTable dt = ToDataSet().Tables["users"];
    var sb = new StringBuilder();
    var xmlWriter = XmlWriter.Create(sb);
    dt.WriteXml(xmlWriter);
    var xml = XDocument.Parse(sb.ToString());
    xml.Descendants("is_published").ToList().ForEach(e => e.Value = "1");
    dt.Clear();
    dt.BeginLoadData();
    dt.ReadXml(xml.CreateReader());
    dt.EndLoadData();
Broken
  • 31
  • 1
  • 4
  • It is alright to [answer your own question](http://stackoverflow.com/help/self-answer), but if you do so, do it only once, do not create an answer for every variation you try or discover. Since your own answer contains a question, i think this should be a new question or at least you should have updated your initial question with the solution. – keenthinker May 01 '14 at 10:28