I hope this is an appropriate question for here. If not, let me know, and I will remove it immediately.
Question:
How can I use python to inspect (visually?) a large dataset for errors that arise during combination?
Background:
I am working with several large (but not, you know "Big") datasets that I combine to form one larger dataset. This new set is ~2.5G in size, so it does not fit in most spreadsheet programs, or at least not in the ones I've tried (MS Excel, OpenOffice).
The process to create the final dataset uses fuzzy matching (via fuzzywuzzy
), and I want to inspect the results of the matching to see if there are any errors introduced.
As of now, I have tried importing the entire set into a pandas
dataframe. This DF has 64 columns, so when I simply do something like df.head()
the resulting displayed info obviously does not show all the columns; I thus ruled out just iterating through multiple .head()
calls.
There is a similar question about visualizing specific aspects of a dataframe here. My question is different, I think, because I don't need to visualize anything about the underlying structure or types. I just want to visually inspect areas I suspect might have errors.