Basically, I'm looking for an efficient way (in terms of coding effort) to present a list of pairs of Dicts in a human-readable form. In Python 2.7.
I have two lists of OrderedDict
. Each dict is a record of book data (title, author etc). One list has messy data (typo's etc), the other has tidy data. I'm using difflib.SequenceMatcher
to find the closest match of untidy titles, to tidy ones. That works nicely.
It gives me a list of pairs of dicts, namely each untidy dict to it's closest matching tidy one. Those pairs need to be reviewed, pair by pair, by humans. So I want to output each pair to the screen, showing the untidy and the tidy dict side by side, each in it's own panel. Each dict may have a varying amount of additional fields, eg. co-author, publisher, date, etc.
difflib.HtmlDiff
doesn't really do what I want.
Exporting to Excel (via CSV) is not ideal, because data isn't flat. (One line in excel will have a different number of fields than another). Likewise for Google Refine, I think that's more oriented towards tabular data.
Call me lazy, but Tkinter or XML/HTML seem to be overkill. It's just a once-off exercise.
I'm not familiar at all with JSON nor YAML, maybe I should look there?
Any better suggestions?
I have this hunch that I haven't found the right search terms yet.