Feature engineering is the process of using domain knowledge of the data to create features that make machine learning algorithms work
Questions tagged [feature-engineering]
481 questions
-1
votes
2 answers
How to find the difference between time and feed the difference in a new column?
I have a dataframe trades_df which looks like this -
Open Time
Open Price
Close Time
19-08-2020 12:19
1.19459
19-08-2020 12:48
28-08-2020 03:09
0.90157
08-09-2020 12:20
It has columns open_time and close_time in the format 19-08-2020…
user18587858
-1
votes
1 answer
Machine learning - does the independent variable data need to be balanced as well?
I know that we need to have balanced data in y to have a better model. However, I'm wondering whether we need to have balanced data in independent variable as well.
In the following dataframe, X3 is a category type independent variable.
X1 X2 …

John
- 129
- 12
-1
votes
1 answer
How to divide all numeric columns by each other?
I have dataframe with more than 100 features, half of it are numeric columns. I want to generate new features by dividing columns by each other. Is there an easy way to do it? Example:
df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)),…
-1
votes
1 answer
Should we always first perform feature normalization and then the feature reduction?
Sometimes performing feature reduction reduces number of features with methods like PCA and then we could scale only the relevant variables. Is there a rule that we need to do normalization/scaling first and then the feature reduction?

Sharat Ainapur
- 19
- 8
-1
votes
1 answer
Modify equivalent values in a column
I'm working with Pandas, but I have a question about how to change equivalent values. I want to work with binary values in the "class" column so I have 1 and I want 2 and 3 to be changed to 0. Ah! And I don't just have these lines, I have 70 in…

StaLLoNe_CoBRa
- 23
- 6
-1
votes
1 answer
Is there any other way (to combine values of one column into different groups), instead of using 'df.replace( )' several times in the below problem?
In :
char_df['Loan_Title'].unique()
Out:
array(['debt consolidation', 'credit card refinancing',
'home improvement', 'credit consolidation', 'green loan', 'other',
'moving and relocation', 'credit cards', 'medical expenses',
'refinance', 'credit…

Castle
- 9
- 2
-1
votes
1 answer
How to Exclude Holidays and Weekends from a Bank data in python
I have a bank data having dates and amount,
and a holiday csv file given separately
which has dates of holiday and I have to add the
amount values from date of holiday to the next
working day and make the amount of the day
having holiday '0'

firestorm
- 1
- 3
-1
votes
2 answers
How to refer to other rows in Pandas DataFrame in context of a single row?
I have the following example Pandas DataFrame
df
UserID Total Date
1 20 2019-01-01
1 18 2019-01-02
1 22 2019-01-03
1 16 2019-01-04
1 17 2019-01-05
1 26 2019-01-06
1 30 2019-01-07
1 28 …

Taher Elhouderi
- 233
- 2
- 11
-1
votes
1 answer
How to plot a scatter plot to understand the general trend in data, when we have multiple features
Here,
Features are X_train
Target is y_train
When there is a dataset with 'n' number of features how will we select that one feature to make a scatter plot with the target variable to understand the general trend of the training data, to select a…

yuvraj singh
- 88
- 2
- 7
-1
votes
1 answer
Data pre-processing and feature engineering
I have been doing some reading on data pre-processing and feature engineering including feature selection, feature importance and feature construction.
My understanding is that Feature engineer is applied in data preprocessing stage. Additionally,…

Shosho
- 69
- 6
-1
votes
1 answer
how to fillna the nan value in age feature for the titanic data?
I wan to fill the nan value in age feature . In the titatic train data pclass and embarked feature are independent feature .Based on these feature I want to fill the nan value of the age feature.
Pclass - (0,1,2) unique value, Embarked -…

Amit Saini
- 136
- 2
- 16
-1
votes
1 answer
sklearn ValueError: Input contains NaN
ValueError: Input contains NaN
i have run
from sklearn.preprocessing import OrdinalEncoderfrom
data_.iloc[:,1:-1] = OrdinalEncoder().fit_transform(data_.iloc[:,1:-1])
here is data_
Age Sex Embarked Survived
0 22.0 male S …

xyssyxxys
- 1
- 1
-1
votes
1 answer
Feature Extraction Using Representation Learning
I'm new to machine learning, and I've been given a task where I'm asked to extract features from a data set with continuous data using representation learning (for example a stacked autoencoder).
Then I'm to combine these extracted features with the…

annatn998
- 75
- 8
-1
votes
1 answer
Pandas: how to add column representing the intersection of 2 attributes in a Dataframe
lets say i have 2 csv files (very large files),
the first file represents restaurants and have 6 attributes restaurant_id, name,star_rating,city,zone,closed
the second file represents the categories of the restaurants and have 2 attributes…

Lynn
- 121
- 8
- 25
-1
votes
1 answer
How should I deal with NaN values when the data isn't categorical and determining them isn't practical?
I'm currently doing the house prices kaggle, and there is a feature of the year which the garage was built in. There are houses without a garage, so the feature is NaN for them.
How should I deal with this situation? Imputing those values with 0…

Yuval
- 1
- 1