TL;DR
A1: There is no real benefit to just pour ill-prepared data to ill-prepared machine ( ref. below why >>> )
A2: Yes & No, you have built [a Model], but not [The Model] adequate for the above stated task.
A3: dataUnlabeled
will only start to play some sense, after you build an adequately constructed ANN-Model ( ref. (1) below ) and happen to succeed in (2) to get it rigorously trained for the intended task. That would make reasonable to step into (3).
Let's demystify the Neural Network first, ok?
If the ANN engine is expected to work for your trading, get acquainted with it's capabilities before just pouring heaps of data and praying to decode the results.
Never overestimate a Model. Understanding the Model capabilities ( and better the limits of Model's capabilities ) is the first best thing, one may do, before implementing or just re-using a Model ( and trying to rely on a missing 5th Element -- Strange? Not so, many times saw "practitioners" using linear-Model to "learn" quadratic problem - no one will succeed to draw a smooth parabolic curve with a straight as beam of light, just linear ruler, sorry, never ... Using a wrong Model yields but wrong results ... independently of any amount of data ).
Neural Networks are functionally nothing more complex, than a passive infrastructure consisting of a few serial layers that are fully-inter-connected with wires, bearing a variable resistor on each wire in between the layers - soldered 1:1 on the pins in the middle window -
( yes, as dirty as radio-hacking with solder-iron and many, indeed many wires, may get ).

Sure, I decided to simplify this a bit, skipping the non-linear transfer units ( having their specific, but not cardinal here, operational characteristics { sigmoid | tanh | ... } ), but these would just distract one's attention from the most important goal - to understand the two key and distinct concepts
-(1) the ANN-passive infrastructure ( wires + resistors )
and
-(2) the ANN-active tuning process - a work to be done - before (1) becomes any useful for (3) -- the manual tuning of the knobs, to adjust each of the variable resistors ... ( ouch! ) ... to the best achievable state ( having a minimum overall amount of penalised errors, being measured on the ANN output(s) ).
So, designing the step (1) is called an ANN design, and the result is aka NN-ARCHITECTURE.
The enormous, and indeed IMMENSE efforts are the (2) -- tuning, aka NN-TRAINING, which finally produces one such quasi-static resistor settings, that provide better results on the outputs ( checked against all the supervised training examples ) than any other mechanical-setup so far experimented with. I intentionally avoid to mention the strategy, how to touch the cohort of variable resistors and the metrics of the "better-ness", the less the "best-ness", of assessments on the outputs, as that would here just hurt you at this moment even more, than the quite practical imagination that you need to manually touch and adjust each and every variable-resistor-knob many and many and many times in the loop, as the trial-error process continues, sometimes providing a bit better result ( a one more compliant with your supervised data labels ), sometimes the very contrary. C'est La Vie.
Is it so brute as it sounds? Oh, yes, Sir, indeed it is...
Yes, even a computerised version, using the powers of almost 5 GHz silicon, may spend and spends tens of days, to process a relatively trivial sized ANN to solve (2) to some acceptable state ( which is for the Algorithmic Trading domain MUCH HARDER than for any academia and toy-problems ), so the manual approach is sure outside of a range of a practical use, but is important, as this is mechanically exact and the same, what the ( computerised ) ANN training does.
[ToDo] (1) DESIGN SUMMARY:
- You plan to build a Classifier - you may opt to have one neuron OUTPUT-LAYER and force the network to learn, how to discriminate the wished-to-have output to become a tri-state output
{ -1 | 0 | 1 }
, or one may opt to have a three-neuron OUTPUT-LAYER [ A, B, C ]
and pick the one, that has the highest output value.
- You plan to have one or more HIDDEN LAYER(s), that will help [The Model] gain the flexibility to respond in a very non-linear manner to the whole variety of INPUTs, to be principally able to yield the proper OUTPUT(s) value(s). Thus the NN-depth
H
and each respective width[h]
, h = { 1, .., H }
are your next set of design choices here. This is where the magic grows. Using a single hidden-layer, translated into human language means - expecting results to be just a linear combination of inputs ( well, linear combination of slightly non-linear element's transformed inputs ), which frankly seems to be a quite an unsupported expectation for building a trading strategy.
- Your supervised ( known and manually pre-labeled ) input data examples contain both the 9-INPUTs - The "state"-values
[ x1_Date, x2_Mean, x3_, x4_, x5_, x6_, x7_, x8_, x9_ ]
and for each such example also a known ( as we go supervised ) value, that we want to get on the OUTPUT(s) for this Classifier
example -- The "Supervised_LABEL(s)"-values
[ y_LABEL == { -1 | 0 | +1 } ]
or
[ y_LABEL_A == { 0 | 1 }, y_LABEL_B == { 0 | 1 }, y_LABEL_C == { 0 | 1 } ]; .sum() == 1
.
So bingo, you have also a task to prepare your data to "match" the NN-ARCHITECTURE -- transforming the y_LABEL
-s into [ A, B, C ]
-s, using ._convertToOneOfMany()
method or similarly otherwise.
[ToDo] (2) TRAINING SUMMARY:
So far so good, given your (1) NN-INFRASTRUCTURE is ready, the ride starts to be more thrilling here:
- one ought split the available dataset into two parts. Your choice was 75% for Training [The Model] ( here ) and 25% for testing ( later ) how well [The Model]-tuned-in-(2) actually works on unseen data - reviewing a cardinal property aka an ABILITY TO GENERALISE. Fine, this goes in a correct direction, but we need one more, also Out-of-Sample, sub-set, to become principally able to compare different composite sets, consisting of [ [The Model]-from-(1) + [The Model]-tuning-parameters-from-(2) ]. If there were no separate set, unseen in both (1)+(2), one could hardly compare different [(1)+(2)] compositions quantitatively fair & un-biased. So, the SIZE of the available supervised-learning ( with known & correct labels ) dataset MATTERS ( A LOT ).
- Next, a training strategy ( not the mechanical step to move the knobs, but the principles / the ideas behind it - how to decide / calculate which one and how much ought be increased or decreased for each particular resistor, so as to improve a bit ( not spoil ) the intended behaviour ) comprises some additional factors, related to the desire, how to "shape" the NN - how to calculate a penalty on errors on output(s), how much to add ( if at all ) a "superpositioned" penalisation from a regularisation factors ( L1-, L2-based et al ), so as to "form" the network-response-function on INPUTs ( the ANN behaviour ). Maybe, most of this could be hidden under the hood of the NN-framework of one's choice, but it plays important role as time is money and poor strategy may converge slow or need not converge at all - all that at immense costs of spoilt time in (2).
[ToDo] (3) USING THE BEST CANDIDATE SELECTED FROM [(1)+(2)]:
The sweet part comes here.
Given we have done our tasks in (1) + (2) thoroughly, now we can just deploy the one candidate, that did best on the ability to generalise [ validated on the last part of the Out-of-sample examples, not yet seen in neither of (1)+(2)].
This means, that such an ANN will provide estimates in response to the unlabeled examples, that you send on the ANN's INPUT-LAYER neurons.
(a) Given your modelling efforts were fair and thorough
and
(b) Given your unlabeled examples still belong to the system-state, that is coherent with the state the training/labeled data were collected
Then
you can believe in the prepared ANN-mechanics as the ANN-provided OUTPUT-LAYER value(s) reflect to it's best efforts the trained "experience" and provide meaningful predictions ( compatible with the training-rewarded behaviour ).