1

I'm implementing a "reconciliation" library which will allow to perform diffs between different data objects. As part of the implementation, I'm converting the objects to compare (mostly CSV files) to datatables and doing a specific steps of comparison with the last one being comparing the actual values from the rows.

For doing the row comparison I'm using the code below:

  var rowsMissingInTrgt = rowsInTrgt.Except(rowsInSrc, DataRowComparer.Default);
  var rowsMissingInSrc = rowsInSrc.Except(rowsInTrgt, DataRowComparer.Default);
  return rowsMissingInSrc.Count() > 0 ? false : 
         rowsMissingInTrgt.Count() > 0 ? false : 
         true;

Instead of using the default DataRowComparer, I would like to implement a custom DataRowComparer, but would like all the comparison to happen in parallel as those tasks are independent of each other and at the end provide optionality to do either a logical_AND or logical_OR on the comparison tasks.

Questions:

  1. Is implementing the "IEqualityComparer<TRow>where TRow : DataRow" sufficient to invoke a parallel comparison of the rows?

  2. For the logical_AND, I think, it would make sense to abort the rest of the comparisons on the first "false". Can this be done?

  3. For the logical_OR, I would need something similar to wait_All on the threads. How can this be implemented?

svick
  • 236,525
  • 50
  • 385
  • 514
Codex
  • 1,022
  • 2
  • 17
  • 25
  • What exactly do you want to run in parallel? Should each `Except()` be parallelized (as your suggestion of using a custom `DataRowComparer` would indicate)? Or do you want to run multiple invocations of this method in parallel (as your mentions of using *or* and *and* indicate)? – svick Jun 20 '12 at 12:10
  • I want the comparison of the individual rows to be parallelized. The type of invocation of this method{logical_AND or logical_OR} will dictate the behaviour of the parallel comparisons. – Codex Jun 20 '12 at 13:01

1 Answers1

0

Use IEnumerable<T>.AsParallel() for the source and target collections.

DataTable sourceTable;
DataTable targetTable;

// run each collection as parallel.
var sourceRows = sourceTable.Rows.Cast<DataRow>().AsParallel();
var targetRows = targetTable.Rows.Cast<DataRow>().AsParallel();

var rowsMissingInTarget = sourceRows.Except(targetRows, DataRowComparer.Default);
var rowsMissingInSource = targetRows.Except(sourceRows, DataRowComparar.Default);

The parallelism applies to the rows collection and not for the comparison itself. For example, on a table with 100k records the processing can be done using 2 (or more) threads, each one doing 50k comparisons. I recommend you to make some performance tests. Since each row comparison is pretty fast, I suspect that parallel processing the rows in fact will be slower in this case.

Marcelo De Zen
  • 9,439
  • 3
  • 37
  • 50