I have a relatively large csv file containing a list of companies, products, and prices. The ordering of the data is not guaranteed (i.e. not sorted):
csv#1 (big file)
...
CompanyA productB 0
CompanyA productA 0
CompanyA productC 0
CompanyB productA 0
CompanyB productB 0
CompanyB productC 0
...
Some of the entries in "csv#1" have bad data (zeroes). I have a second csv containing only the names from csv#1 that had bad data (and their corrected data). The ordering of this csv is by descending price:
csv#2 (small file - subset of csv#1)
CompanyA productC 15
CompanyA productB 10
CompanyA productA 5
CompanyB productA 3
CompanyB productB 2
CompanyB productC 1
I want to iterate through csv#1 and if the combination of Company + product is in csv#2, overwrite with csv#2 price.
I know I can do this by brute force, iterating over csv#2 for every row in csv#1. I could even optimize by loading csv#2 into an array and removing entries once they are found (each combination will show up exactly once in csv#1). But I am certain there must be a better way.
I found some references indicating that sets
are a more efficient way to do these kinds of lookup searches:
Most efficient way for a lookup/search in a huge list (python)
Fastest way to search a list in python
But I am not sure how to apply sets
to my example. How to I structure a set
here, given the multiple search columns, and the need to return a value if there is a match? Or is there a better approach than sets
?