0

Let's say my inputs are, A and B. The inputs are roughly like this, A = [10, 5, 30, 2] which can have arbitrary values in the range [1,100] and B = [0, 1, 0, 0] which is an one hot vector. The expected output, C, is = [5], which is the dot product of the two input vectors, C = A.B

Similarly, for A = [10, 5, 30, 2] and B = [0, 0, 1, 0], the output will be C = [30].

Essentially, I want the neural network to act as a 4-way multiplexer (https://en.wikipedia.org/wiki/Multiplexer).

I have implemented a neural network with two hidden layers. While it works on training data, it fails to generalize beyond that.

Is there an underlying reason that this problem will be difficult for a Neural Network?

2 Answers2

0

According to the Universal approximation theorem, fully connected neural networks with a single hidden layer can be "practically" universal approximators (given a series of conditions and considerations).

More of that here: https://en.wikipedia.org/wiki/Universal_approximation_theorem

So, yes, the network can indeed approximate a multiplexor.You have to take into account a few factors. Maybe you can try standardizing or normalizing your input data (input data in different scales can disrupt the network learning process), you can find some information here:

https://stats.stackexchange.com/questions/10289/whats-the-difference-between-normalization-and-standardization

Also, take a look at your input space, you have 100^4 times 4 possible inputs (around 4 x 10^8, based on that, you have to consider the size of your training data, because a few thousand examples won't make the cut, because the data is very disperse ( the examples in the training sample can be very different from the ones in the validation data).

Juan David
  • 430
  • 4
  • 17
0

Basically, you're looking for A.T * B (in numpy terms). Juan's answer is right and wrong at the same time. Neural networks don't do interactions (just check the math), so there is no "natural" representation of this formula. It can be approximated, though, if you have a substantially complex architecture.

Without neural networks, with bare Tf, it is just tf.math.reduce_sum(A * B).

Example:

>>> A = tf.constant([10, 5, 30, 2])                                                                                             
>>> B = tf.constant([0, 1, 0, 0]) 
>>> with  tf.Session() as sess: print(tf.math.reduce_sum(A * B).eval())
5
Marat
  • 15,215
  • 2
  • 39
  • 48