Yes, you are correct in both your idea and your concerns for the first issue.
What you are trying to do is Parameter Optimization. This term is usually used when you try to optimize the parameters of your classifier, e.g., the number of trees for the Random Forest or the C parameter for SVMs. But you can apply it as well to pre-processing steps and filters.
What you have to do in this case is a nested cross-validation. (You should check https://stats.stackexchange.com/ for more information, for example here or here). It is important that the final classifier, including all pre-processing steps like binning and such, has never seen the test set, only the training set. This is the outer cross-validation.
For each fold of the outer cross-validation, you need to do an inner cross-validation on the training set to determine the optimal parameters for your model.
I'll try to "visualize" it on a simple 2-fold cross-validation
Data set
########################################
Split for outer cross-validation (2-fold)
#################### ####################
training set test set
Split for inner cross-validation
########## ##########
training test
Evaluate parameters
########## ##########
build with evaluated
bin size 5 acc 70%
bin size 10 acc 80%
bin size 20 acc 75%
...
=> optimal bin size: 10
Outer cross-validation (2-fold)
#################### ####################
training set test set
apply bin size 10
train model evaluate model
Parameter optimization can be very exhausting. If you have 3 parameters with 10 possible parameter values each, that makes 10x10x10=1000 parameter combinations you need to evaluate for each outer fold.
This is a topic of machine learning by itself, because you can do everything from the naive grid search to evolutionary search here. Sometimes you can use heuristics. But you need to do some kind of parameter optimization every time.
As for your second question: This is really hard to tell without seeing your data. But you should post that as a separate question anyway.