Questions tagged [feature-engineering]

Feature engineering is the process of using domain knowledge of the data to create features that make machine learning algorithms work

481 questions
-1
votes
2 answers

How to find the difference between time and feed the difference in a new column?

I have a dataframe trades_df which looks like this - Open Time Open Price Close Time 19-08-2020 12:19 1.19459 19-08-2020 12:48 28-08-2020 03:09 0.90157 08-09-2020 12:20 It has columns open_time and close_time in the format 19-08-2020…
user18587858
-1
votes
1 answer

Machine learning - does the independent variable data need to be balanced as well?

I know that we need to have balanced data in y to have a better model. However, I'm wondering whether we need to have balanced data in independent variable as well. In the following dataframe, X3 is a category type independent variable. X1 X2 …
-1
votes
1 answer

How to divide all numeric columns by each other?

I have dataframe with more than 100 features, half of it are numeric columns. I want to generate new features by dividing columns by each other. Is there an easy way to do it? Example: df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)),…
-1
votes
1 answer

Should we always first perform feature normalization and then the feature reduction?

Sometimes performing feature reduction reduces number of features with methods like PCA and then we could scale only the relevant variables. Is there a rule that we need to do normalization/scaling first and then the feature reduction?
-1
votes
1 answer

Modify equivalent values in a column

I'm working with Pandas, but I have a question about how to change equivalent values. I want to work with binary values in the "class" column so I have 1 and I want 2 and 3 to be changed to 0. Ah! And I don't just have these lines, I have 70 in…
-1
votes
1 answer

Is there any other way (to combine values of one column into different groups), instead of using 'df.replace( )' several times in the below problem?

In : char_df['Loan_Title'].unique() Out: array(['debt consolidation', 'credit card refinancing', 'home improvement', 'credit consolidation', 'green loan', 'other', 'moving and relocation', 'credit cards', 'medical expenses', 'refinance', 'credit…
-1
votes
1 answer

How to Exclude Holidays and Weekends from a Bank data in python

I have a bank data having dates and amount, and a holiday csv file given separately which has dates of holiday and I have to add the amount values from date of holiday to the next working day and make the amount of the day having holiday '0'
firestorm
  • 1
  • 3
-1
votes
2 answers

How to refer to other rows in Pandas DataFrame in context of a single row?

I have the following example Pandas DataFrame df UserID Total Date 1 20 2019-01-01 1 18 2019-01-02 1 22 2019-01-03 1 16 2019-01-04 1 17 2019-01-05 1 26 2019-01-06 1 30 2019-01-07 1 28 …
-1
votes
1 answer

How to plot a scatter plot to understand the general trend in data, when we have multiple features

Here, Features are X_train Target is y_train W​hen there is a dataset with 'n' number of features how will we select that one feature to make a scatter plot with the target variable to understand the general trend of the training data, to select a…
-1
votes
1 answer

Data pre-processing and feature engineering

I have been doing some reading on data pre-processing and feature engineering including feature selection, feature importance and feature construction. My understanding is that Feature engineer is applied in data preprocessing stage. Additionally,…
-1
votes
1 answer

how to fillna the nan value in age feature for the titanic data?

I wan to fill the nan value in age feature . In the titatic train data pclass and embarked feature are independent feature .Based on these feature I want to fill the nan value of the age feature. Pclass - (0,1,2) unique value, Embarked -…
Amit Saini
  • 136
  • 2
  • 16
-1
votes
1 answer

sklearn ValueError: Input contains NaN

ValueError: Input contains NaN i have run from sklearn.preprocessing import OrdinalEncoderfrom data_.iloc[:,1:-1] = OrdinalEncoder().fit_transform(data_.iloc[:,1:-1]) here is data_ Age Sex Embarked Survived 0 22.0 male S …
xyssyxxys
  • 1
  • 1
-1
votes
1 answer

Feature Extraction Using Representation Learning

I'm new to machine learning, and I've been given a task where I'm asked to extract features from a data set with continuous data using representation learning (for example a stacked autoencoder). Then I'm to combine these extracted features with the…
-1
votes
1 answer

Pandas: how to add column representing the intersection of 2 attributes in a Dataframe

lets say i have 2 csv files (very large files), the first file represents restaurants and have 6 attributes restaurant_id, name,star_rating,city,zone,closed the second file represents the categories of the restaurants and have 2 attributes…
Lynn
  • 121
  • 8
  • 25
-1
votes
1 answer

How should I deal with NaN values when the data isn't categorical and determining them isn't practical?

I'm currently doing the house prices kaggle, and there is a feature of the year which the garage was built in. There are houses without a garage, so the feature is NaN for them. How should I deal with this situation? Imputing those values with 0…
1 2 3
31
32