How to approach a machine learning programming competition

Question

Many machine learning competitions are held in Kaggle where a training set and a set of features and a test set is given whose output label is to be decided based by utilizing a training set.

It is pretty clear that here supervised learning algorithms like decision tree, SVM etc. are applicable. My question is, how should I start to approach such problems, I mean whether to start with decision tree or SVM or some other algorithm or is there is any other approach i.e. how will I decide?

While an interesting question, this is both subjective and likely off-topic :-( — NPE, Jul 18 '13 at 05:39
Is there any forum where I can post this question because I am very interested in such competitions. — Joy, Jul 18 '13 at 05:44
Well, you could pose the question on the kaggle forums. No doubt many people will be more than willing to give you some advice and it is the more appropriate venue. (it is an interesting question though) — mvherweg, Jul 18 '13 at 06:08
Well in order to compete in a machine learning competitio, you should know some machine learning. One option is to go to the university and study those topics, another is to use coursera.org courses to get a theoretical input. Check the site, there are a bunch of courses related to ML. Some are also in youtube.com. Look up machine learning standford. Viel glück — marbel, Jul 18 '13 at 20:36

Steve P. · Accepted Answer · 2013-07-18T07:19:11.870

So, I had never heard of Kaggle until reading your post--thank you so much, it looks awesome. Upon exploring their site, I found a portion that will guide you well. On the competitions page (click all competitions), you see Digit Recognizer and Facial Keypoints Detection, both of which are competitions, but are there for educational purposes, tutorials are provided (tutorial isn't available for the facial keypoints detection yet, as the competition is in its infancy. In addition to the general forums, competitions have forums also, which I imagine is very helpful.

If you're interesting in the mathematical foundations of machine learning, and are relatively new to it, may I suggest Bayesian Reasoning and Machine Learning. It's no cakewalk, but it's much friendlier than its counterparts, without a loss of rigor.

EDIT: I found the tutorials page on Kaggle, which seems to be a summary of all of their tutorials. Additionally, scikit-learn, a python library, offers a ton of descriptions/explanations of machine learning algorithms.

score 2 · Answer 2 · answered Jul 18 '13 at 20:13

2

This cheatsheet http://peekaboo-vision.blogspot.pt/2013/01/machine-learning-cheat-sheet-for-scikit.html is a good starting point. In my experience using several algorithms at the same time can often give better results, eg logistic regression and svm where the results of each one have a predefined weight. And test, test, test ;)

answered Jul 18 '13 at 20:13

scc

10,342
10
51
65

+1 Nice link! I'm a big fan of ensemble learning--Random forests are awesome--or are you referring to using different algorithms for different *parts* of the data, or something else entirely? – Steve P. Jul 19 '13 at 00:47

score 1 · Answer 3 · answered Jul 18 '13 at 07:53

There is No Free Lunch in data mining. You won't know which methods work best until you try lots of them.
That being said, there is also a trade-off between understandability and accuracy in data mining. Decision Trees and KNN tend to be understandable, but less accurate than SVM or Random Forests. Kaggle looks for high accuracy over understandability.
It also depends on the number of attributes. Some learners can handle many attributes, like SVM, whereas others are slow with many attributes, like neural nets.
You can shrink the number of attributes by using PCA, which has helped in several Kaggle competitions.

How to approach a machine learning programming competition

3 Answers3