Questions tagged [feature-extraction]

In pattern recognition and in image processing, feature extraction is a special form of dimensionality reduction. Transforming the input data into the set of features is called feature extraction. If the features extracted are carefully chosen it is expected that the features set will extract the relevant information from the input data in order to perform the desired task using this reduced representation instead of the full size input.

Feature extraction involves simplifying the amount of resources required to describe a large set of data accurately. When performing analysis of complex data one of the major problems stems from the number of variables involved. Analysis with a large number of variables generally requires a large amount of memory and computation power or a classification algorithm which overfits the training sample and generalizes poorly to new samples. Feature extraction is a general term for methods of constructing combinations of the variables to get around these problems while still describing the data with sufficient accuracy.

Best results are achieved when an expert constructs a set of application-dependent features. Nevertheless, if no such expert knowledge is available general dimensionality reduction techniques may help.

Source: Wikipedia

1664 questions

votes

1 answer

Combining feature extraction classes in scikit-learn

I'm using sklearn.pipeline.Pipeline to chain feature extractors and a classifier. Is there a way to combine multiple feature selection classes (for example the ones from sklearn.feature_selection.text) in parallel and join their output? My code…

asked Oct 04 '12 at 06:27

Daniel

26,899
12
60
88

votes

2 answers

Tensorflow feature column for variable list of values

From the TensorFlow docs it's clear how to use tf.feature_column.categorical_column_with_vocabulary_list to create a feature column which takes as input some string and outputs a one-hot vector. For example vocabulary_feature_column = …

tensorflow machine-learning neural-network feature-extraction

asked Feb 09 '18 at 02:18

GratefulGuest

votes

4 answers

How to deal with array of string features in traditional machine learning?

Problem Let's say we have a dataframe that looks like this: age job friends label 23 'engineer' ['World of Warcraft', 'Netflix', '9gag'] 1 35 'manager' NULL …

machine-learning deep-learning feature-extraction feature-engineering

asked Jun 16 '20 at 13:08

tooskoolforkool

votes

4 answers

TSFRESH library for python is taking way too long to process

I came across the TSfresh library as a way to featurize time series data. The documentation is great, and it seems like the perfect fit for the project I am working on. I wanted to implement the following code that was shared in the quick start…

python-2.7 time time-series feature-extraction

asked Dec 14 '16 at 16:56

Michael Bawol

votes

2 answers

CountVectorizer: "I" not showing up in vectorized text

I'm new to scikit-learn, and currently studying Naïve Bayes (Multinomial). Right now, I'm working on vectorizing text from sklearn.feature_extraction.text, and for some reason, when I vectorize some text, the word "I" doesn't show up in the…

scikit-learn feature-extraction

asked Dec 21 '13 at 09:45

covariance

6,833
7
23
24

votes

1 answer

How to encode dependency path as a feature for classification?

I am trying to implement relation extraction between verb pairs. I want to use dependency path from one verb to the other as a feature for my classifier (predicts if relation X exists or not). But I am not sure how to encode the dependency path as a…

machine-learning nlp stanford-nlp feature-extraction information-extraction

asked Sep 25 '15 at 20:04

Syed Fahad Sultan

votes

1 answer

Extract single line contours from Canny edges

I'd like to extract the contours of an image, expressed as a sequence of point coordinates. With Canny I'm able to produce a binary image that contains only the edges of the image. Then, I'm trying to use findContours to extract the contours. The…

opencv computer-vision contour edge-detection feature-extraction

asked Aug 06 '13 at 08:00

Muffo

1,733
2
19
29

votes

1 answer

How to calculate Local Binary Pattern Histograms with OpenCV?

I have already seen that OpenCV provides a classifier based on LBP histograms: But I want to have access to the LBP histogram itself. For instance: histogram = calculate_LBP_Histogram( image ) Is there any function that performs this in OpenCV?

opencv feature-extraction lbph-algorithm

asked Dec 05 '12 at 22:14

EijiAdachi

votes

2 answers

Is it possible to query Elastic Search with a feature vector?

I'd like to store an n-dimensional feature vector, e.g. <1.00, 0.34, 0.22, ..., 0>, with each document, and then provide another feature vector as a query, with the results sorted in order of cosine similarity. Is this possible with Elastic Search?

elasticsearch information-retrieval feature-extraction

asked May 13 '15 at 23:26

neptune

1,380
2
17
25

votes

3 answers

Best practice for holding huge lists of data in Java

I'm writing a small system in Java in which i extract n-gram feature from text files and later need to perform Feature Selection process in order to select the most discriminators features. The Feature Extraction process for a single file return a…

java data-structures feature-extraction feature-selection computation

asked Jan 14 '15 at 13:17

Aviadjo

votes

3 answers

extracting pitch features from audio file

I am trying to extract pitch features from an audio file which I would use for a classification problem. I am using python(scipy/numpy) for classification. I think I can get frequency features using scipy.fft but I don't know how to approximate…

python audio scipy feature-extraction

asked Dec 22 '13 at 13:51

Ada Xu

votes

2 answers

Feature Hashing on multiple categorical features(columns)

I would like to hash feature ‘Genre’ into 6 columns and separately feature ‘Publisher’ into another six columns. I want something like below: Genre Publisher 0 1 2 3 4 5 0 1 2 3 4 5 0 Platform …

python pandas dataframe scikit-learn feature-extraction

asked Jan 19 '19 at 10:15

Noor

votes

1 answer

Understanding the output of mfcc

from librosa.feature import mfcc from librosa.core import load def extract_mfcc(sound): data, frame = load(sound) return mfcc(data, frame) mfcc = extract_mfcc("sound.wav") I would like to get the MFCC of the following sound.wav file…

python audio artificial-intelligence feature-extraction mfcc

asked Sep 08 '18 at 06:59

Eduardo Morales

votes

2 answers

RandomForestRegressor and feature_importances_ error

I am struggling to pull out the feature importances from my RandomForestRegressor, I get an: AttributeError: 'GridSearchCV' object has no attribute 'feature_importances_'. Anyone know why there is no attribute? According to documentation there…

python scikit-learn random-forest feature-extraction grid-search

asked Nov 04 '17 at 13:47

Svarto

votes

1 answer

Empty vocabulary for single letter by CountVectorizer

Trying to convert string into numeric vector, ### Clean the string def names_to_words(names): print('a') words = re.sub("[^a-zA-Z]"," ",names).lower().split() print('b') return words ### Vectorization def Vectorizer(): …

python nlp vectorization feature-extraction countvectorizer

asked Apr 25 '17 at 04:02

LookIntoEast

8,048
18
64
92

Prev 1

…

99 100 Next