I have to analyse completely unknown numerical data(I don't know what it concerns).
There are some samples below from the training data:
'yout': array([[ 0.00000000e+00, -7.87464718e-08, -7.31121013e-08, ...,
-4.20583628e-07, -3.62647412e-07, -2.17680232e-07],
[ -1.13230235e-13, -9.38223846e-05, 8.30087034e-05, ...,
-1.66600921e-07, -2.18490921e-07, 3.85091720e-07],
[ 3.32348250e-06, -1.93950410e-04, 1.54892852e-04, ...,
-7.36868568e-08, -1.41946370e-07, 2.15633282e-07],
...,
[ 9.72858182e-04, 7.22416022e-05, -1.68044656e-05, ...,
-2.90709866e-06, 2.59359588e-06, 3.13502801e-07],
[ 9.71197632e-04, 7.19938095e-05, -1.67844712e-05, ...,
-2.91106565e-06, 2.58013028e-06, 3.30935374e-07],
[ 9.80158036e-04, 7.25326131e-05, -1.69481316e-05, ...,
-2.94693184e-06, 2.59483672e-06, 3.52095128e-07]]),
'uin': array([[ -9.01855411e-03, 0.00000000e+00, 0.00000000e+00, ...,
0.00000000e+00, -7.99360578e-14, 0.00000000e+00],
[ -9.01855411e-03, 0.00000000e+00, 0.00000000e+00, ...,
0.00000000e+00, -6.21724894e-14, 0.00000000e+00],
[ -9.01855411e-03, 0.00000000e+00, 0.00000000e+00, ...,
0.00000000e+00, 1.41805257e-05, 0.00000000e+00],
...,
[ -2.50927606e-02, 0.00000000e+00, 0.00000000e+00, ...,
0.00000000e+00, -8.40115265e-01, 0.00000000e+00],
[ -2.50927606e-02, 0.00000000e+00, 0.00000000e+00, ...,
0.00000000e+00, -8.40071885e-01, 0.00000000e+00],
[ -2.50891131e-02, 0.00000000e+00, 0.00000000e+00, ...,
0.00000000e+00, -8.40028529e-01, 0.00000000e+00]]),
'time': array([[ 0.00000000e+00],
[ 1.00000000e-02],
[ 2.00000000e-02],
...,
[ 1.99980000e+02],
[ 1.99990000e+02],
[ 2.00000000e+02]])
The shape of output, input and time array respectively:
((184112, 63), (184112, 21), (184112, 1))
What have I done with input data so far?
- tidying - removing a few columns which retains only zeros
- applying some statistic: mean,min,max,percentiles and correlation matrix
- visualising: histogram of each numerical attribute, pairplot using seaborn
- clustering: K-Means and elbow method; after looking for the best number of clusters it turned out that there are 3 clusters
The problem is that I don't know to verify my suspicion that there are 3 clusters, no idea how to make use of output data (which contains 3 times more features) and moreover what to do with timestamps.
Can anyone advise me how I should carry on my analysis, please?
(I do ask for your understanding, because I am totally beginner in Data Analysis , even more so ML and AI. )