1

I am currently calculating several columns of data using a function similar to this one.

DataTable data = RetrieveDataTable();
var value = data.Compute(
    "AVG(Important_Variable)", 
    "(Distance > 1230 and Distance < 1760) OR (Distance > 4710 and Distance < 5400)"
);

This is working but it is slow. Is there a faster way to achieve the same thing? Different data types are perfectly ok, I am not married to DataTables.

PlTaylor
  • 7,345
  • 11
  • 52
  • 94
  • 1
    Why not run this query on the DB? – System Down Oct 03 '13 at 19:21
  • This is part of a routine that reads in a large number of text files (think 100's to 1000's) reads the numbers in the text files, parses them into a DataTable per file that usually has about 1.5k rows and then calculates metrics from those tables. – PlTaylor Oct 03 '13 at 19:25

2 Answers2

2

I don't know how you're populating your DataTable but one way that just occurred to me is to accumulate the sum and count of the elements (which you need to calculate an average) as you populate the DataTable. So something like:

int sum;
int count;

for(something) //Loop that populates the data table
{
   .
   .
   var Important_Variable = Read something
   Populate(Important_Variable)
   count++;
   if((Distance > 1230 and Distance < 1760) || (Distance > 4710 and Distance < 5400))
      sum += Important_Variable;
   .
   .
}
var avg = sum/count;

This way you don't have to run the filter after the fact(which is I'm pretty sure is the time consuming part).

System Down
  • 6,192
  • 1
  • 30
  • 34
1
  1. Map everything to a POCO object.
  2. Write a (readonly) get property......that does the calculation.

Here is a ~basic~ ORM mapper (Aka, create your POCO's)

Why is DataTable faster than DataReader

Tack on the readonly (get; only) property that uses the other properties of the object.

If you calculation is "costly", and you read it more than one time, you can use this "nullable" trick.

public class Employee
{

    public DateTime BirthDate { get; set; }
    public DateTime HireDate { get; set; }

    TimeSpan? _difference = null;
    public TimeSpan Difference
    {
        get
        {
            TimeSpan returnValue;
            if (this._difference.HasValue)
            {
                returnValue = this._difference.Value;
            }
            else
            {
                   /* COSTLY CALCULATION HERE , ONE TIME */
                this._difference = this.HireDate.Subtract(this.BirthDate);
                   /* End COSTLY Calculation */

                returnValue = this._difference.Value;
            }
            return returnValue;
        }
    }

}
Community
  • 1
  • 1
granadaCoder
  • 26,328
  • 10
  • 113
  • 146
  • If I do a POCO object and create a List and run a LINQ query against them will that be any faster than a DataTable? – PlTaylor Oct 03 '13 at 19:29
  • @PlTaylor depending on how you write your Linq query it may be able to do things like `AsParallel()` and use the parallel [`Average(...)`](http://msdn.microsoft.com/en-us/library/dd384172.aspx) function to potentially perform better. You would need to benchmark it. – Scott Chamberlain Oct 03 '13 at 19:30
  • Each file that is read is already being done in a parallel fashion, so AsParallel() would likely not help. – PlTaylor Oct 03 '13 at 19:33
  • Looking at the operation he's executing (an average) I don't think he'll be able to parallelize it. – System Down Oct 03 '13 at 19:33
  • Are you sure `Compute(` is doing it in parallel? Also @SystemDown I updated my old comment to the built in parallel Average function. Just join that with a `Where(` clause to filter the range. – Scott Chamberlain Oct 03 '13 at 19:33
  • Compute( isn't parrallel, but I am reading 100's of files and calculating the metrics in parallel. While one file is being worked on, another one is being worked on in parallel using parallel.foreach(file) – PlTaylor Oct 03 '13 at 19:35
  • Edit made to my original response. – granadaCoder Oct 03 '13 at 19:49