Given a pair of DataTable
s with an arbitrary number of columns, and given a function that can create a grouping value of a reasonable type from each of the two DataTable
s, you can use Linq to do most of the work for you.
Let's start with a function to extract the join key from the DataTable
s. It'd be nice to just return an object[]
, but they don't compare well. We can do it with a Tuple<object, object>
however - those work nicely for this purpose. And if you need more columns you can just add more columns :P
// Produces a JoinKey as Tuple containing columns 'A' and 'C' (0 and 2)
public Tuple<object, object> GetJoinKey(DataRow row)
{
return Tuple.Create(row[0], row[2]);
}
Now the join. We can't do a full outer join directly, but we can do an outer join both ways and Union
the results:
// given DataTables table1 & table2:
var outerjoin =
(
from row1 in table1.AsEnumerable()
join row2 in table2.AsEnumerable()
on GetJoinKey(row1) equals GetJoinKey(row2)
into matches
from row2 in matches.DefaultIfEmpty()
select new { key = GetJoinKey(row1), row1, row2 }
).Union(
from row2 in table2.AsEnumerable()
join row1 in table1.AsEnumerable()
on GetJoinKey(row2) equals GetJoinKey(row1)
into matches
from row1 in matches.DefaultIfEmpty()
select new { key = GetJoinKey(row2), row1, row2 }
);
Next up you have to create a suitable output format - a DataTable
that has all the rows from both sources, plus a field to hold some info about the key:
DataTable result = new DataTable();
// add column for string value of key:
result.Columns.Add("__key", typeof(string));
// add columns from table1:
foreach (var col in table1.Columns.OfType<DataColumn>())
result.Columns.Add("T1_" + col.ColumnName, col.DataType);
// add columns from table2:
foreach (var col in table2.Columns.OfType<DataColumn>())
result.Columns.Add("T2_" + col.ColumnName, col.DataType);
And finally, fill the table from the query:
var row1def = new object[table1.Columns.Count];
var row2def = new object[table2.Columns.Count];
foreach (var src in outerjoin)
{
// extract values from each row where present
var data1 = (src.row1 == null ? row1def : src.row1.ItemArray);
var data2 = (src.row2 == null ? row2def : src.row2.ItemArray);
// create row with key string and row values
result.Rows.Add(new object[] { src.key.ToString() }.Concat(data1).Concat(data2).ToArray());
}
Of course we could short out a couple of those operations to get a single Linq query that does 99% of the work for us. I'll leave that to you to play with if it sounds like fun.
Here's the full method, done as an extension with a generic function for the join key generator, making it reasonably generic:
public static DataTable FullOuterJoin<T>(this DataTable table1, DataTable table2, Func<DataRow, T> keygen)
{
// perform inital outer join operation
var outerjoin =
(
from row1 in table1.AsEnumerable()
join row2 in table2.AsEnumerable()
on keygen(row1) equals keygen(row2)
into matches
from row2 in matches.DefaultIfEmpty()
select new { key = keygen(row1), row1, row2 }
).Union(
from row2 in table2.AsEnumerable()
join row1 in table1.AsEnumerable()
on keygen(row2) equals keygen(row1)
into matches
from row1 in matches.DefaultIfEmpty()
select new { key = keygen(row2), row1, row2 }
);
// Create result table
DataTable result = new DataTable();
result.Columns.Add("__key", typeof(string));
foreach (var col in table1.Columns.OfType<DataColumn>())
result.Columns.Add("T1_" + col.ColumnName, col.DataType);
foreach (var col in table2.Columns.OfType<DataColumn>())
result.Columns.Add("T2_" + col.ColumnName, col.DataType);
// fill table from join query
var row1def = new object[table1.Columns.Count];
var row2def = new object[table2.Columns.Count];
foreach (var src in outerjoin)
{
// extract values from each row where present
var data1 = (src.row1 == null ? row1def : src.row1.ItemArray);
var data2 = (src.row2 == null ? row2def : src.row2.ItemArray);
// create row with key string and row values
result.Rows.Add(new object[] { src.key.ToString() }.Concat(data1).Concat(data2).ToArray());
}
return result;
}
Now, IF the tables have the same schema (which the above doesn't care about), you can do almost exactly the same thing - modify the result table generation to just clone one of the tables, then add some merge logic in the load loop.
Here's a Gist of the above with testing and verification that it's doing what it says. Drop that in your compiler and see what you get out.