I have a data in which each sample has feature vector consisting of x and about 9000 other features and also corresponding y(target value). in which x and y are both continuous values(between 0 to 20). x a noisy data but we can not recognize the source of the noise. The goal is to predict y from x and other features(features are not noisy). number of samples are about 900,000. what are the machine learning approaches I can use in this problem. also famous networks in neural network or deep learning.
Asked
Active
Viewed 567 times
0
-
I'm a little confused what you are looking for? Would you like suggestions on algorithms, configurations of algorithms, both? – Andnp Jun 17 '16 at 18:12
-
I'm looking for algorithms or common machine learning approaches can be use to solve this problem. For example this problem can be seen as a regression problem but input data which is x is noisy that we don't have this property in regression. I want to know if there is a common method or algorithm in machine learning is like to solve this kind of problem or problems near this problem. if there is any I would appreciate to just name it. – sandra Jun 17 '16 at 18:23
-
1In this case, I think you answered the question yourself. The first approach that comes to mind for me would be a neural network. The interactions between features of your dataset would determine the architecture of that network (so we can't comment there), and you would have a single output node that would handle the regression. – Andnp Jun 17 '16 at 18:32
-
Do I read this correctly? You're trying to predict Y from X alone, without using the other 9000 features? – Prune Jun 17 '16 at 18:57
-
1Take a look at "How to approach machine learning problems with high dimensional input space?" http://stackoverflow.com/questions/2255833/how-to-approach-machine-learning-problems-with-high-dimensional-input-space – WestCoastProjects Jun 17 '16 at 19:11
-
@Prune No I will consider all features in this problem, but I separate x from the other in my explanation because just this feature is noisy – sandra Jun 17 '16 at 19:54
-
Excellent! In that case, my answer below should be of some use to you. – Prune Jun 17 '16 at 20:56
1 Answers
1
This sounds to me like a standard regression problem, although your prediction correlation is going to suck (technical term :-) ) in direct proportion to the noisiness of x. Look up all the educational examples for predicting housing prices (often used to illustrate gradient descent). You have 9000 features instead of 3 or 4, but that's just a matter of training time.
You might also consider some "factor analysis", so that you can eliminate the features that don't contribute enough to y (correlation coefficient near 0.0). This is called "dimensionality reduction"; look for PCA (Principal Component Analysis).

Prune
- 76,765
- 14
- 60
- 81
-
1Please provide wikipedia algorithms reference to the technical term `suck` . It would help my progress in ML. – WestCoastProjects Jun 17 '16 at 19:06
-
Just a joke; I should have marked it as such. :-) It translates roughly as "exhibit unacceptable results and response". – Prune Jun 17 '16 at 20:53
-