I would suggest the following general steps
- Get the raw data
You can read the excel file into a pandas dataframe in python. Ideally you will have a raw dataframe that looks something like this
Filename Keep
0 X:\4. Economics ...\filexyz.pdf 0
1 X:\4. Economics ...\fileabc.pdf 1
2 X:\3. Finance ...\filetef.pdf 1
3 X:\3. Finance ...\file123.pdf 0
4 G:\2. Philosophy ..\file285.pdf 0
....
- Preprocess/clean
This part is more up to you, for example you could remove all special characters and numbers. This would leave letters as follows
Filename Keep
0 "X Economics filexyz pdf" 0
1 "X Economics fileabc pdf" 1
2 "X Finance filetef pdf" 1
3 "X Finance file123 pdf" 0
4 "G Philosophy file285 pdf" 0
....
- Vectorize your strings
For an algorithm to understand your text data, you typically vectorize them. This means you turn them into numbers that the algorithm can process. An easy way to do this is with tf-idf and scikit-learn. After this your dataframe might look something like this
Filename Keep
0 [0.6461, 0.3816 ... 0.01, 0.38] 0
1 [0., 0.4816 ... 0.25, 0.31] 1
2 [0.61, 0.1663 ... 0.11, 0.35] 1
....
- Train a classifier
Now that you have nice numbers for the algorithms to work with, you can train a classifier with scikit-learn. Simply search for "scikit learn classification example" and you will find plenty.
Once you have a trained classifier, you can compare its predictions on test data that it has not seen before. That way you get a feeling for accuracy.
Hopefully that is enough to get you started!