2

My question basically is : in a learning problem, are there data sets that neural networks are not advised to be used ? What are some popular characteristics of such data sets?

The reason why I am asking is : In some articles it is proven that neural networks can learn any function. But are all the data sets represent a function? If they are not qualified to do so; what are the properties of unqualified data sets?

In my research I have difficulty in finding a good architecture and parameter combination. I am suspicious about the data set itself. Because I see the following pattern

    
Input1  Input2 Target
0.8     0.6    0.3
0.8     0.6    0.3
0.8     0.6    0.0
0.8     0.6    0.1

As a human I can not make a prediction about the target by looking at the inputs, and I expect that a neural network will not predict accurately as well. So may be some other approach is advised for such a case.

mcvkr
  • 3,209
  • 6
  • 38
  • 63
Already
  • 21
  • 2
  • What do you mean by learning a data set? A neural network learns the relationship between inputs and outputs. – Vaughn Cato Jan 09 '16 at 21:32
  • @VaughnCato I mean learning the relationship (function) between the inputs and outputs of a data set? – Already Jan 09 '16 at 21:42
  • 1
    You're right. A neural network cannot (correctly) learn the output to give for an input of (0.8,0.6) because there is no single answer. It may end up learning to give the most common output or some type of weighted average though. – Vaughn Cato Jan 09 '16 at 21:46
  • @VaughnCato I changed the way I asked. Thanks for guiding me. – Already Jan 09 '16 at 21:58

2 Answers2

1

There is no definite answer as long as you can not say what is the true value. Or more specifically that there is a true value.

However, there are two situations which are quite common, which can result in such data.

1.) Noisy output let say the data you observe comes from a function

 f(x,y) = g(x,y) + N(0,0.1)

Where g(x,y) gives a unique value however there is normal distributed noise added to your function. If you have enough training date your NN will slowly converge to the right value. Even if the noise is not normal distributed training can be adapted

2.) No unique true value There is a another situation which might be imaginable. There is not a unique true value. Given the training data above me as a human would learn. in 0.5 cases the result of f(0.8,0.6)=0.3 and so on. Neural Networks are also capable to learn these functions.

What are neural networks not able to learn. There are some assumptions in machine learning which might not be able to learn. For example if your data is not independent it would be a big problem. So if you have as target in your training data independent of the inputs the pattern 0.3,0.3,0.0,0.1,0.3,0.3,0.0,0.1,.... learning would be tough.

In general you need to be able to formulate what you want to learn. This is typically done in terms of an objective functions otherwise, you can never be sure what the networks learns (c.f., no free lunch theorem)

CAFEBABE
  • 3,983
  • 1
  • 19
  • 38
1

Before the algorithm implementation and tuning, maybe the first thing should be to look at data quality. There is a very good reference paper(one of many) for this , I wish it helps

Goodchild, Michael F., and Keith C. Clarke. "Data quality in massive data sets." Handbook of massive data sets. Springer US, 2002. 643-659.

mcvkr
  • 3,209
  • 6
  • 38
  • 63