I have a CSV (comma separated values) file that contains student information. The column headers look like StudentId, StudentFirstName, StudentLastName, StudentZipCode, StudentHeight, StudentCommuteMethod, etc. and the subsequent rows contain the information for individual students. Now, I would like to write a python 2.5 script that takes a filtering condition as a command line parameter and return the set of students (rows) that match this filter condition. For example, the filter condition could be something like below (using pseudo code format):
"StudentCommuteMethod = Bus AND StudentZipCode = 12345"
and the python script could be invoked:
MyPythonScript.py -filter "<above string>" -i input.csv
This should return the list of all students (rows) who live in an area with zip code 12345 and who commute by bus. The filter could also be arbitrarily complex and may include any number of AND, OR operators.
QUESTIONS:
What is the best format in which this program could have the user specify the filter condition (as a command line parameter). The format should be simple for simple expressions and must be powerful enough to express all types of conditions.
- The formats I thought of were (1) SQL, and (2) python language itself. In either case, I don't know how to have python apply these filters at runtime. That is, how do I take the expression entered at command line and apply it to a row to get true or false?
I would like to have a UI for expressing the filter condition in a visual manner. Perhaps something that allows entering a simple two-operand condition per row and some inutive way to combine them using ANDs and ORs. It should be able to emit a filter expression in the format decided for (1) above. Is there some open source project I could reuse for it?
If you think that there is a better way to solve this problem than passing a command line expression + UI, feel free to mention it. In the end, the user (an electrical engineer who doesn't know a lot about programming) should be able to enter the filter expression easily.
Thanks!
NOTE: I don't have control over the input or output format (both csv files).