Questions tagged [machine-learning]

Implementation questions about machine learning algorithms. General questions about machine learning (concepts, theory, methodology, terminology, etc.) should be posted to their specific communities.

Machine learning revolves around developing self-learning computer algorithms that function by virtue of discovering patterns in data and making intelligent decisions based on such patterns.

Machine learning is a subfield of computer science that evolved from the study of pattern recognition and computational learning theory in artificial intelligence. Machine learning explores the construction and study of algorithms that can learn from and make predictions about data. Such algorithms operate by building a model from example inputs in order to make data-driven predictions or decisions rather than following strictly static program instructions.

NOTE: If you want to use this tag for a question not directly concerning implementation, then consider posting on Cross Validated, Data Science, or Artificial Intelligence instead; otherwise you're probably off-topic. Please choose one site only and do not cross-post to more than one - see Is cross-posting a question on multiple Stack Exchange sites permitted if the question is on-topic for each site? (tl;dr: no).

Classic Problems:

Classification (supervised learning) classification supervised-learning
Regression (supervised learning) regression
Clustering (unsupervised learning) cluster-analysis unsupervised-learning
Density estimation
Sampling
Reinforcement Learning reinforcement-learning

Relevant Algorithms:

Principal component analysis (PCA) pca
Artificial neural networks (ANN) neural-network
Support vector machines (SVM) svm support-vector-machines
K-nearest neighbor (kNN) knn nearest-neighbor
k-means k-means
Bayesian networks bayesian-networks
Gaussian mixture model (GMM) mixture-model
Decision trees decisiontrees
Genetic algorithms genetic-algorithm
Simulated annealing simulated-annealing
Hidden Markov model (HMM) hidden-markov-models
Conditional Random Field (CRF)
Gaussian Processes gaussian-process
Kalman filter kalman kalman-filter
Particle filter particle-filter
Gibbs sampling
Graphical models
Ensemble methods (bagging, boosting, ...) ensemble-learning
Deep learning deep-learning
Q-Learning q-learning

Applications:

Computer vision (e.g, object tracking, gesture recognition) computer-vision
Image recognition (e.g, face, gait, iris, handwriting) image-recognition face-recognition ocr
Speech recognition speech-recognition
Speaker recognition voice-recognition
Natural language processing (NLP) nlp
Music information retrieval (MIR)
Bioinformatics bioinformatics
Spam filtering spam-filtering
Anomaly detection anomaly-detection
Automatic vehicle driving
Recommendation system recommendation-engine
Machine translation machine-translation

Software:

LibSVM libsvm
Weka weka
Orange orange
Shogun shogun
scikit-learn scikit-learn
PyBrain pybrain
Apache Mahout mahout
RapidMiner rapidminer
KNIME knime
Waffles
Azure Machine Learning azure-machine-learning
nltk nltk
Caffe caffe
TensorFlow tensorflow
Theano theano
Keras keras
OpenNMT opennmt
XGBoost xgboost
CatBoost catboost
Stanford CoreNLP stanford-nlp

Related-tags:

Video Lectures:-

Machine Learning with Python

55241 questions

votes

1 answer

Sklearn fit vs predict, order of columns matters?

Say X1 and X2 are 2 pandas dataframes with the same columns, but possibly in different order. Assume model is some sort of sklearn model, like LassoCV. Say I do model.fit(X1, y), and then model.predict(X2). Is the fact that the columns are in…

python machine-learning scikit-learn

asked Aug 02 '18 at 22:40

Baron Yugovich

3,843
12
48
76

votes

2 answers

Predict classes or class probabilities?

I am currently using H2O for a classification problem dataset. I am testing it out with H2ORandomForestEstimator in a python 3.6 environment. I noticed the results of the predict method was giving values between 0 to 1(I am assuming this is the…

python machine-learning classification random-forest h2o

asked Jul 16 '18 at 18:06

Rahul

44,892
25
73
103

votes

1 answer

Pandas and scikit-learn: KeyError: [....] not in index

I do not understand why do I get the error KeyError: '[ 1351 1352 1353 ... 13500 13501 13502] not in index' when I run this code: cv = KFold(n_splits=10) for train_index, test_index in cv.split(X): f_train_X, f_valid_X = X[train_index],…

python pandas machine-learning scikit-learn

asked Jun 28 '18 at 20:58

ScalaBoy

3,254
13
46
84

votes

1 answer

PyTorch : predict single example

Following the example from: https://github.com/jcjohnson/pytorch-examples This code trains successfully: # Code in file tensor/two_layer_net_tensor.py import torch device = torch.device('cpu') # device = torch.device('cuda') # Uncomment this to…

python machine-learning pytorch backpropagation

asked Jun 26 '18 at 10:53

blue-sky

51,962
152
427
752

votes

2 answers

out of sample definition

Can anyone explain the difference between “in-sample” and “out-of-sample” forecasts?

machine-learning statistics forecasting computational-finance

asked Feb 23 '11 at 06:16

Amber

votes

9 answers

What's the difference between reinforcement learning, deep learning, and deep reinforcement learning?

What's the difference between reinforcement learning, deep learning, and deep reinforcement learning? Where does Q-learning fit in?

machine-learning neural-network deep-learning reinforcement-learning q-learning

asked May 26 '18 at 12:34

user9851027

votes

1 answer

How to understand RandomForestExplainer output (R package)

I have the following code, which basically try to predict the Species from iris data using randomForest. What I'm really intersed in is to find what are the best features (variable) that explain the species classification. I found the package…

r machine-learning random-forest

asked Apr 19 '18 at 04:04

neversaint

60,904
137
310
477

votes

2 answers

What is Sequence length in LSTM?

The dimensions for the input data for LSTM are [Batch Size, Sequence Length, Input Dimension] in tensorflow. What is the meaning of Sequence Length & Input Dimension ? How do we assign the values to them if my input data is of the form : [[[1.23]…

machine-learning lstm

asked Mar 30 '18 at 11:06

Stuti Kalra

votes

2 answers

Should RNN attention weights over variable length sequences be re-normalized to "mask" the effects of zero-padding?

To be clear, I am referring to "self-attention" of the type described in Hierarchical Attention Networks for Document Classification and implemented many places, for example: here. I am not referring to the seq2seq type of attention used in…

tensorflow machine-learning deep-learning recurrent-neural-network attention-model

asked Mar 27 '18 at 21:27

t-flow

votes

1 answer

How to achieve stratified K fold splitting for arbitrary number of categorical variables?

I have a dataframe of the form, df: cat_var_1 cat_var_2 num_var_1 0 Orange Monkey 34 1 Banana Cat 56 2 Orange Dog 22 3 Banana Monkey 6 .. Suppose the possible…

python pandas numpy machine-learning scikit-learn

asked Feb 26 '18 at 12:07

Melsauce

2,535
2
19
39

votes

6 answers

How to Upload Many Files to Google Colab?

I am working on a image segmentation machine learning project and I would like to test it out on Google Colab. For the training dataset, I have 700 images, mostly 256x256, that I need to upload into a python numpy array for my project. I also have…

python machine-learning jupyter google-colaboratory

asked Feb 19 '18 at 23:36

Jesse Cambon

votes

2 answers

Intermediate layer makes tensorflow optimizer to stop working

This graph trains a simple signal identity encoder, and in fact shows that the weights are being evolved by the optimizer: import tensorflow as tf import numpy as np initia = tf.random_normal_initializer(0, 1e-3) DEPTH_1 = 16 OUT_DEPTH = 1 I =…

python tensorflow machine-learning deep-learning autoencoder

asked Feb 15 '18 at 21:49

lurscher

25,930
29
122
185

votes

1 answer

Soft margin in linear support vector machine using python

I'm learning support vector machine and trying to come up with a simple python implementation (I'm aware of the sklearn package, just to help understand the concepts better) that does simple linear classification. This is the major material I'm…

python machine-learning svm

asked Feb 15 '18 at 09:51

Jason

2,950
2
30
50

votes

3 answers

All intermediate steps should be transformers and implement fit and transform

I am implementing a pipeline using important features selection and then using the same features to train my random forest classifier. Following is my code. m = ExtraTreesClassifier(n_estimators = 10) m.fit(train_cv_x,train_cv_y) sel =…

python machine-learning scikit-learn feature-selection

asked Feb 13 '18 at 01:59

Stupid420

1,347
3
19
44

votes

1 answer

Validation and Testing accuracy widely different

I am currently working on a dataset in kaggle. After training the model of the training data, I testing it on the validation data and got an accuracy of around 0.49. However, the same model gives an accuracy of 0.05 on the testing data. I am using…

machine-learning deep-learning cross-validation training-data kaggle

asked Feb 10 '18 at 08:13

user3828311

Prev 1 2 3

…

99 100 Next