I try classify documents based on their bag of words representation (Features: 1000). For the classification, I am using a SVM, it seems that sometimes the SVM doesn't terminate and runs endlessly. (Running sci-kit: SVC(C=1.0,kernel='linear', cache_size=5000, verbose=True)) Now I am searching for a solution, I was thinking about to apply a MinMax-Scaler to get a computation efficient document representation. But do I screw up my bag of word representation with the feature normalization?
Thanks in advance!