In order to fit a linear regression model to some given training data X and labels y, i want to augment my training data X by nonlinear transformations of the given features. Let's say we have the feature x1, x2 and x3. And we want to use the additional transformed features:
x4 = x12, x5 = x22 and x6 = x32
x7 = exp(x1), x8 = exp(x2) and x9 = exp(x3)
x10 = cos(x1), x11 = cos(x2) and x12 = cos(x3)
I tried the following approach, which however lead to a model that performed very poorly in terms of Root Mean Squared Error as evaluation criterion:
import pandas as pd
import numpy as np
from sklearn import linear_model
#import the training data and extract the features and labels from it
DATAPATH = 'train.csv'
data = pd.read_csv(DATAPATH)
features = data.drop(['Id', 'y'], axis=1)
labels = data[['y']]
features['x6'] = features['x1']**2
features['x7'] = features['x2']**2
features['x8'] = features['x3']**2
features['x9'] = np.exp(features['x1'])
features['x10'] = np.exp(features['x2'])
features['x11'] = np.exp(features['x3'])
features['x12'] = np.cos(features['x1'])
features['x13'] = np.cos(features['x2'])
features['x14'] = np.cos(features['x3'])
regr = linear_model.LinearRegression()
regr.fit(features, labels)
I'm quite new to ML and there is for sure a better option to do these nonlinear feature transformations, I'm very happy for your help.
Cheers Lukas