Yes, you can order the data by primary key, and then write the data to a json or xml file.
Then you can run diff over the two files.
You can also run this chunked by primary-key, that way you don't have to work with a huge file.
Log any diff that doesn't show as equal.
If it doesn't matter what the difference is, you could also just run MD5/SHA1 on the two file chunks, and if the hash machtches, there is no difference, if it doesn't, there is.
Speaking from experience with nhibernate, what you need to watch out for is:
- bit fields
- text, ntext, varchar(MAX), nvarchar(MAX) fields
(they map to varchar with no length, by the way - encoding UTF8)
- varbinary, varbinary(MAX), image (bytea[] vs. LOB)
- xml
- that all primary-key's id serial generator is reset after you inserted all data in pgsql.
Another thing to watch out is which time zone CURRENT_TIMESTAMP uses.
Note:
I'd actually run System.Data.DataRowComparer directly, without writing data to a file:
static void Main(string[] args)
{
DataTable dt1 = dt1();
DataTable dt2= dt2();
IEnumerable<DataRow> idr1 = dt1.Select();
IEnumerable<DataRow> idr2 = dt2.Select();
// MyDataRowComparer MyComparer = new MyDataRowComparer();
// IEnumerable<DataRow> Results = idr1.Except(idr2, MyComparer);
IEnumerable<DataRow> results = idr1.Except(idr2);
}
Then you write all non-matching DataRows into a logfile, for each table one directory (if there are differences).
Don't know what Python uses in place of System.Data.DataRowComparer, though.
Since this would be a one-time task, you could also opt to not do it in Python, and use C# instead (see above code sample).
Also, if you had large tables, you could use DataReader with sequential access to do the comparison. But if the other way cuts it, it reduces the required work considerably.