Questions tagged [classification]

In machine learning and statistics, classification is the problem of identifying which of a set of categories a new observation belongs to, on the basis of a training set of data containing observations whose category membership (label) is known.

In machine learning and statistics, classification refers to the problem of predicting category memberships based on a set of pre-labeled examples. It is thus a type of supervised learning.

Some of the most important classification algorithms are support vector machines , logistic regression, naive Bayes, random forest and artificial neural networks .

When we wish to associate inputs with continuous values in a supervised framework, the problem is instead known as . The unsupervised counterpart to classification is known as (or cluster analysis), and involves grouping data into categories based on some measure of inherent similarity.

7859 questions
2
votes
0 answers

Tensorflow Estimator Feature Column increase weight

I have a DNNLinearCombinedClassifier to predict if an article get sold or not. I need DNN for feature like description and Linear for features like size, category, price, etc. In general it works, but the weight of the price is too low. The price is…
NiBurhe
  • 93
  • 6
2
votes
1 answer

Accuracy of model got stuck at 50% while training an Age and Gender detection model

I was working through the Keras implementation of Age and Gender Detection model described in the research paper Age and Gender Classification using Convolutional Neural Networks'. It was originally a Caffe model but I thought to convert it to…
2
votes
0 answers

Python - Compare similarity / classify images with SIFT descriptors quickly

I understand that this is a popular question on Stack Overflow however, I have not managed to find the best solution yet. Background I am trying to classify an image. I currently have 10,000 unique images that a given image can match with. For each…
brian4342
  • 1,265
  • 8
  • 33
  • 69
2
votes
2 answers

Python Fraud Detection Classification Algorithms

I am working on a credit card fraud detection model and have labeled data containing orders for an online store. The columns I am working with is: Customer Full Name, Shipping Address and Billing Address (city, state, zip, street), Order Quantity,…
2
votes
0 answers

using class weights with sklearn votingClassifier

I have an imbalance dataset for a classification problem. My target variable is binary and has two category. I implemented Random Forest and Logistic Regression by assigning class_weights as parameter. When I fit data to random forest and logistic…
2
votes
2 answers

LightGBM : validation AUC score during model fit differs from manual testing AUC score for same test set

I have a LightGBM Classifier with following parameters: lgbmodel_2_wt = LGBMClassifier(boosting_type='gbdt', num_leaves= 105, max_depth= 11, learning_rate=0.03, …
Nayak S
  • 428
  • 1
  • 5
  • 18
2
votes
1 answer

Pipeline and GridSearchCV, and Multi-Class challenge for XGBoost and RandomForest

I am working on workflows using Pipeline and GridSearchCV. MWE for RandomForest, as below, ################################################################# # Libraries ################################################################# import…
Saravanan K
  • 672
  • 3
  • 10
  • 27
2
votes
0 answers

How to classify people's clothes by Gabor filter?

I'd like to identify person from another using Gabor filter. It is working fine but I don't understand how to classify. Does it need for example to SVM as classifier? I understand from this paper that it don't need SVM OR another classifier The full…
Redhwan
  • 927
  • 1
  • 9
  • 24
2
votes
1 answer

Finding data points close to the decision boundary of a classifier

Sorry if this is a very simple question. But I'm a newcomer to the field. My specific question is this: I have trained an XGboost classifier in Python. After the training, how can I get the samples in my training data that are closer than a fixed…
iii
  • 121
  • 2
2
votes
1 answer

How to get multi-class roc_auc in cross validate in sklearn?

I have a classification problem where I want to get the roc_auc value using cross_validate in sklearn. My code is as follows. from sklearn import datasets iris = datasets.load_iris() X = iris.data[:, :2] # we only take the first two features. y =…
EmJ
  • 4,398
  • 9
  • 44
  • 105
2
votes
4 answers

How to choose n_estimators in RandomForestClassifier?

I'm building a Random Forest Binary Classsifier in python on a pre-processed dataset with 4898 instances, 60-40 stratified split-ratio and 78% data belonging to one target label and the rest to the other. What value of n_estimators should I choose…
keenlearner
  • 83
  • 1
  • 2
  • 9
2
votes
1 answer

How to combine two LSTM layers with different input sizes in Keras?

I have two types of input sequences where input1 contains 50 values and input2 contains 25 values. I tried to combine these two sequence types using a LSTM model in functional API. However since the length of my two input sequences are different, I…
EmJ
  • 4,398
  • 9
  • 44
  • 105
2
votes
1 answer

Sklearn different results with the same random_state across different systems (machines)

I have a python script that generates predictions using sklearn Random Forest and fixed random_state = 0. It produces always deterministic results on the one computer (system) but when I switch to another computer, results are different. Is there a…
2
votes
2 answers

High precision recall for train data but very poor for test data in classification problem

I'm very new to ML and I'm trying to build a classifier for unbalanced binary class for a real life problem. I've tried various models like Logistic regression, Random Forest, ANN, etc but every time I'm getting very high precision and recall…
vishnu priya
  • 117
  • 1
  • 2
  • 9
2
votes
1 answer

F1 - score with imbalanced data

I am working on a binary classification task. My evaluation data is imbalanced and consists of appr. 20% from class1 and 80% from class2. Even I have good classification accuracy on each class type, as 0.602 on class1, 0.792 on class2 if I calculate…