Questions tagged [machine-learning]

Implementation questions about machine learning algorithms. General questions about machine learning (concepts, theory, methodology, terminology, etc.) should be posted to their specific communities.

Machine learning revolves around developing self-learning computer algorithms that function by virtue of discovering patterns in data and making intelligent decisions based on such patterns.

Machine learning is a subfield of computer science that evolved from the study of pattern recognition and computational learning theory in artificial intelligence. Machine learning explores the construction and study of algorithms that can learn from and make predictions about data. Such algorithms operate by building a model from example inputs in order to make data-driven predictions or decisions rather than following strictly static program instructions.

NOTE: If you want to use this tag for a question not directly concerning implementation, then consider posting on Cross Validated, Data Science, or Artificial Intelligence instead; otherwise you're probably off-topic. Please choose one site only and do not cross-post to more than one - see Is cross-posting a question on multiple Stack Exchange sites permitted if the question is on-topic for each site? (tl;dr: no).

Classic Problems:

Classification (supervised learning) classification supervised-learning
Regression (supervised learning) regression
Clustering (unsupervised learning) cluster-analysis unsupervised-learning
Density estimation
Sampling
Reinforcement Learning reinforcement-learning

Relevant Algorithms:

Principal component analysis (PCA) pca
Artificial neural networks (ANN) neural-network
Support vector machines (SVM) svm support-vector-machines
K-nearest neighbor (kNN) knn nearest-neighbor
k-means k-means
Bayesian networks bayesian-networks
Gaussian mixture model (GMM) mixture-model
Decision trees decisiontrees
Genetic algorithms genetic-algorithm
Simulated annealing simulated-annealing
Hidden Markov model (HMM) hidden-markov-models
Conditional Random Field (CRF)
Gaussian Processes gaussian-process
Kalman filter kalman kalman-filter
Particle filter particle-filter
Gibbs sampling
Graphical models
Ensemble methods (bagging, boosting, ...) ensemble-learning
Deep learning deep-learning
Q-Learning q-learning

Applications:

Computer vision (e.g, object tracking, gesture recognition) computer-vision
Image recognition (e.g, face, gait, iris, handwriting) image-recognition face-recognition ocr
Speech recognition speech-recognition
Speaker recognition voice-recognition
Natural language processing (NLP) nlp
Music information retrieval (MIR)
Bioinformatics bioinformatics
Spam filtering spam-filtering
Anomaly detection anomaly-detection
Automatic vehicle driving
Recommendation system recommendation-engine
Machine translation machine-translation

Software:

LibSVM libsvm
Weka weka
Orange orange
Shogun shogun
scikit-learn scikit-learn
PyBrain pybrain
Apache Mahout mahout
RapidMiner rapidminer
KNIME knime
Waffles
Azure Machine Learning azure-machine-learning
nltk nltk
Caffe caffe
TensorFlow tensorflow
Theano theano
Keras keras
OpenNMT opennmt
XGBoost xgboost
CatBoost catboost
Stanford CoreNLP stanford-nlp

Related-tags:

Video Lectures:-

Machine Learning with Python

55241 questions

votes

3 answers

Different result with roc_auc_score() and auc()

I have trouble understanding the difference (if there is one) between roc_auc_score() and auc() in scikit-learn. Im tying to predict a binary output with imbalanced classes (around 1.5% for Y=1). Classifier model_logit =…

python machine-learning scikit-learn

asked Jul 01 '15 at 10:48

gowithefloww

2,211
2
20
31

votes

5 answers

What is the relation between the number of Support Vectors and training data and classifiers performance?

I am using LibSVM to classify some documents. The documents seem to be a bit difficult to classify as the final results show. However, I have noticed something while training my models. and that is: If my training set is for example 1000 around 800…

machine-learning classification svm libsvm

asked Feb 28 '12 at 10:57

Hossein

40,161
57
141
175

votes

12 answers

How to detect patterns in (electrocardiography) waves?

I'm trying to read an image from an electrocardiography and detect each one of the main waves in it (P wave, QRS complex and T wave). I can read the image and get a vector (like (4.2; 4.4; 4.9; 4.7; ...)). I need an algorithm that can walk through…

algorithm language-agnostic machine-learning signal-processing pattern-recognition

asked Feb 03 '10 at 22:51

Alaor

2,181
5
28
40

votes

8 answers

Extracting an information from web page by machine learning

I would like to extract a specific type of information from web pages in Python. Let's say postal address. It has thousands of forms, but still, it is somehow recognizable. As there is a large number of forms, it would be probably very difficult to…

python machine-learning html-parsing web-scraping extract

asked Nov 11 '12 at 23:27

Honza Javorek

8,566
8
47
66

votes

4 answers

Instance Normalisation vs Batch normalisation

I understand that Batch Normalisation helps in faster training by turning the activation towards unit Gaussian distribution and thus tackling vanishing gradients problem. Batch norm acts is applied differently at training(use mean/var from each…

machine-learning neural-network computer-vision conv-neural-network batch-normalization

asked Aug 02 '17 at 14:34

Ruppesh Nalwaya

1,409
2
14
22

votes

4 answers

How to add and remove new layers in keras after loading weights?

I am trying to do a transfer learning; for that purpose I want to remove the last two layers of the neural network and add another two layers. This is an example code which also output the same error. from keras.models import Sequential from…

python machine-learning keras keras-layer

asked Jan 16 '17 at 02:56

Eka

14,170
38
128
212

votes

1 answer

What are the major differences and benefits of Porter and Lancaster Stemming algorithms?

I'm Working on document classification tasks in java. Both algorithms came highly recommended, what are the benefits and disadvantages of each and which is more commonly used in the literature for Natural Language Processing tasks?

java machine-learning nlp

asked May 11 '12 at 15:10

Adam Hess

1,396
1
13
28

votes

3 answers

difference between StratifiedKFold and StratifiedShuffleSplit in sklearn

As from the title I am wondering what is the difference between StratifiedKFold with the parameter shuffle=True StratifiedKFold(n_splits=10, shuffle=True, random_state=0) and StratifiedShuffleSplit StratifiedShuffleSplit(n_splits=10,…

python machine-learning scikit-learn data-science cross-validation

asked Aug 30 '17 at 20:43

gabboshow

5,359
12
48
98

votes

2 answers

Evaluation & Calculate Top-N Accuracy: Top 1 and Top 5

I have come across few (Machine learning-classification problem) journal papers mentioned about evaluate accuracy with Top-N approach. Data was show that Top 1 accuracy = 42.5%, and Top-5 accuracy = 72.5% in the same training, testing condition. I…

algorithm machine-learning evaluation top-n

asked Jun 07 '16 at 00:51

D_9268

1,039
2
9
17

votes

6 answers

Mixing categorial and continuous data in Naive Bayes classifier using scikit-learn

I'm using scikit-learn in Python to develop a classification algorithm to predict the gender of certain customers. Amongst others, I want to use the Naive Bayes classifier but my problem is that I have a mix of categorical data (ex: "Registered…

python machine-learning data-mining classification scikit-learn

asked Jan 10 '13 at 09:08

user1499144

1,063
2
9
9

votes

4 answers

Normalize data before or after split of training and testing data?

I want to separate my data into train and test set, should I apply normalization over data before or after the split? Does it make any difference while building predictive model?

machine-learning data-science normalization training-data train-test-split

asked Mar 23 '18 at 07:13

hemant

votes

15 answers

ImportError('Could not import PIL.Image. ' working with keras-ternsorflow

I'm following some lectures from lynda.com about deep learning using Keras-TensorFlow in a PyCharmCE enviroment and they didn't have this problem. I get this error: raise ImportError('Could not import PIL.Image. ' ImportError: Could not import…

image-processing machine-learning keras

asked Jan 12 '18 at 11:49

Rogelio Em

votes

6 answers

Keras model.summary() result - Understanding the # of Parameters

I have a simple NN model for detecting hand-written digits from a 28x28px image written in python using Keras (Theano backend): model0 = Sequential() #number of epochs to train for nb_epoch = 12 #amount of data each iteration in an epoch…

python machine-learning neural-network keras theano

asked Apr 29 '16 at 20:09

user3501476

1,095
2
14
26

votes

4 answers

Linear regression analysis with string/categorical features (variables)?

Regression algorithms seem to be working on features represented as numbers. For example: This data set doesn't contain categorical features/variables. It's quite clear how to do regression on this data and predict price. But now I want to do a…

python machine-learning regression linear-regression feature-selection

asked Nov 30 '15 at 20:21

Erba Aitbayev

4,167
12
46
81

votes

3 answers

Estimating the number of neurons and number of layers of an artificial neural network

I am looking for a method on how to calculate the number of layers and the number of neurons per layer. As input I only have the size of the input vector, the size of the output vector and the size of the training set. Usually the best net is…

machine-learning neural-network deep-learning artificial-intelligence

asked Jul 27 '10 at 15:13

ladi

1,518
2
13
19

Prev 1 2 3

…

99 100 Next