How to train a machine learning model in python including several target variables

Question

I am trying to build a machine learning model in python. I used pytorch and sklearn to make the model. My model is a bit complicated: I have one input feature but several target variables. My target variables are values making a curve and I used each value of the curve as a different feature. I showed five different curves in the upladed figure.

I used algorithms like DecisionTreeRegressor and RandomeForestRegressor to fit the only input variable to several target variables. But the prediction of trained model is not so well for extrapolation. The trained model can create the a series of data but not so accure. Does anyone know such trained model in Python? I tried hyperparameter tuning using GridSearchCV but it did not help me. In advance I do appreciate your help and feedback.

I don't feel like having a single input variable on top of multiple output makes it a good usecase for ML. — matszwecja, May 12 '22 at 11:59
Dear @matszwecja, I also tried to include ome other input features but the results did not change so much. — Ali_d, May 12 '22 at 12:01
What I mean is if your output relies only on a single input value, ML is like hitting a nail with sledgehammer. Interpolate the function that your data follows and find its maximum — matszwecja, May 12 '22 at 12:08
Dear @matszwecja, You are absolutely right. I tried more inputs but as I said results did not change. Basically, the target variables are a bit tricky because they are so much and in fact I amd trying to predict the whole curve rather than one data point of the curve. — Ali_d, May 12 '22 at 12:13
then you don't even have to find the maximum of the interpolated function. — matszwecja, May 12 '22 at 12:16
-@matszwecja, Sorry, what do you mean by `maximum of the interpolated function`? My knowledge in ML and data science is not enough deep. — Ali_d, May 12 '22 at 12:18
Interpolation is a numeric method, not ML one. It is a process of finding a function that matches with a set of points as closely as possible. So exactly what you said you want to do with "predict the whole curve rather than one data point of the curve.". It is a pretty complex topic, but simple examples of doing that in Python can be seen here: [Interpolation (scipy.interpolate)](https://docs.scipy.org/doc/scipy/tutorial/interpolate.html). I mentioned maximum because this is what ML would be usually used for, but it seems it is irrelevant for you. — matszwecja, May 12 '22 at 12:25
@matszwecja, Thanks for sharing the info. My issue is that I am not dealing with x and y. In examples of Interpolation or regression the relation between only two variables is inspected. Are you proposing me to find functions? And then relating thes functions to my input feature? — Ali_d, May 12 '22 at 12:32
There are some decisions you have to make yourself based on your usecase. You said you want to "predict the whole curve" but there are multiple curves - **what** curve are you trying to predict? Each of them separetely? Then you are in fact dealing with only two variables (multiple times, but each case has only 2 variables involved). You want to aggregate all of the output features into one? Then you need to define how are you aggregating them. Do you want maximum sum? Minimal sum? They are all valid options, depending on what exactly you need. — matszwecja, May 12 '22 at 12:41
@matszwecja, in the figure you see five curves. Each curve is represeting an array of about `130` values. As you see the `x` axis is time. So, I have `130` target variables. Then, I tried to correlate each curve (`130` values) with input features. In other words, the shape of my target variable is `5x130`. For the input is tried different data sets but almost all gave the same results thats why I made it simple and sticked with only one feature (shape of input feature: `5x1`). I donot want to aggregate curves because I want to recreate them completely. — Ali_d, May 12 '22 at 12:56
SO is for programming questions, design questions are better asked on https://datascience.stackexchange.com/. Imho it's not very surprising that you can't obtain very accurate results from a single variable, probably it doesn't contain enough information. But apparently you have expert knowledge about the problem, so maybe it's possible to use this to design a deterministic method, or at least use some constraints like a mathematical function which fits this type of curve. — Erwan, May 13 '22 at 12:48
@Erwan, thanks for your comment. The interesting part is that my results from desiciontreeregressor or randomforestregressor are independent of the number of input features. I tried about 8 features and results were the same as when I used only one of feature (most important one). Do you think PINNs can be more informative in this case? — Ali_d, May 13 '22 at 13:27

How to train a machine learning model in python including several target variables

0 Answers0