What are some algorithms & data structures useful for aligning (financial) asynchronous systems?

Question

I'm auditing some financial data between two transaction systems. The two datasets consist of lists of transactions. Each has,

timestamp,
idiosyncratic transaction IDs (not shared between systems)
transaction values (item names, item values, totals, adjustments).

The data is imperfect; 10-20% of the data doesn't align well.

I've cobbled together a solution to the problem using Pandas: What's a more efficient way to merge rows from DataFrames row-by-row with conditions?. A simple pandas.merge() is too greedy and produces many erroneous matches. A good match is on several criteria across the dataset.

What are some alternatives to using Pandas? As I researched the problem, I found some Pandas answers suggest operations which cause me to be concerned about dropped data or duplication.

To put it into the form of a better Stackoverflow question: What are some popular algorithms or data structures useful for matching activity between two separate systems?

Something that doesn't mutate the original dataset,
won't drop items from the original dataset,
works with the assumption each entry has a match which is either present or missing from the other datasets

`merge_asof()` does a merge on "close" matches, and can either match to the nearest greater or less than. Have you tried that? — elPastor, Mar 19 '17 at 02:57

What are some algorithms & data structures useful for aligning (financial) asynchronous systems?

0 Answers0