Questions tagged [machine-learning]

Implementation questions about machine learning algorithms. General questions about machine learning (concepts, theory, methodology, terminology, etc.) should be posted to their specific communities.

Machine learning revolves around developing self-learning computer algorithms that function by virtue of discovering patterns in data and making intelligent decisions based on such patterns.

Machine learning is a subfield of computer science that evolved from the study of pattern recognition and computational learning theory in artificial intelligence. Machine learning explores the construction and study of algorithms that can learn from and make predictions about data. Such algorithms operate by building a model from example inputs in order to make data-driven predictions or decisions rather than following strictly static program instructions.

NOTE: If you want to use this tag for a question not directly concerning implementation, then consider posting on Cross Validated, Data Science, or Artificial Intelligence instead; otherwise you're probably off-topic. Please choose one site only and do not cross-post to more than one - see Is cross-posting a question on multiple Stack Exchange sites permitted if the question is on-topic for each site? (tl;dr: no).

Classic Problems:

Relevant Algorithms:

Applications:

Software:

Related-tags:

Video Lectures:-

55241 questions
12
votes
1 answer

Sklearn fit vs predict, order of columns matters?

Say X1 and X2 are 2 pandas dataframes with the same columns, but possibly in different order. Assume model is some sort of sklearn model, like LassoCV. Say I do model.fit(X1, y), and then model.predict(X2). Is the fact that the columns are in…
Baron Yugovich
  • 3,843
  • 12
  • 48
  • 76
12
votes
2 answers

Predict classes or class probabilities?

I am currently using H2O for a classification problem dataset. I am testing it out with H2ORandomForestEstimator in a python 3.6 environment. I noticed the results of the predict method was giving values between 0 to 1(I am assuming this is the…
Rahul
  • 44,892
  • 25
  • 73
  • 103
12
votes
1 answer

Pandas and scikit-learn: KeyError: [....] not in index

I do not understand why do I get the error KeyError: '[ 1351 1352 1353 ... 13500 13501 13502] not in index' when I run this code: cv = KFold(n_splits=10) for train_index, test_index in cv.split(X): f_train_X, f_valid_X = X[train_index],…
ScalaBoy
  • 3,254
  • 13
  • 46
  • 84
12
votes
1 answer

PyTorch : predict single example

Following the example from: https://github.com/jcjohnson/pytorch-examples This code trains successfully: # Code in file tensor/two_layer_net_tensor.py import torch device = torch.device('cpu') # device = torch.device('cuda') # Uncomment this to…
blue-sky
  • 51,962
  • 152
  • 427
  • 752
12
votes
2 answers

out of sample definition

Can anyone explain the difference between “in-sample” and “out-of-sample” forecasts?
12
votes
9 answers

What's the difference between reinforcement learning, deep learning, and deep reinforcement learning?

What's the difference between reinforcement learning, deep learning, and deep reinforcement learning? Where does Q-learning fit in?
12
votes
1 answer

How to understand RandomForestExplainer output (R package)

I have the following code, which basically try to predict the Species from iris data using randomForest. What I'm really intersed in is to find what are the best features (variable) that explain the species classification. I found the package…
neversaint
  • 60,904
  • 137
  • 310
  • 477
12
votes
2 answers

What is Sequence length in LSTM?

The dimensions for the input data for LSTM are [Batch Size, Sequence Length, Input Dimension] in tensorflow. What is the meaning of Sequence Length & Input Dimension ? How do we assign the values to them if my input data is of the form : [[[1.23]…
Stuti Kalra
  • 133
  • 1
  • 1
  • 6
12
votes
2 answers

Should RNN attention weights over variable length sequences be re-normalized to "mask" the effects of zero-padding?

To be clear, I am referring to "self-attention" of the type described in Hierarchical Attention Networks for Document Classification and implemented many places, for example: here. I am not referring to the seq2seq type of attention used in…
12
votes
1 answer

How to achieve stratified K fold splitting for arbitrary number of categorical variables?

I have a dataframe of the form, df: cat_var_1 cat_var_2 num_var_1 0 Orange Monkey 34 1 Banana Cat 56 2 Orange Dog 22 3 Banana Monkey 6 .. Suppose the possible…
Melsauce
  • 2,535
  • 2
  • 19
  • 39
12
votes
6 answers

How to Upload Many Files to Google Colab?

I am working on a image segmentation machine learning project and I would like to test it out on Google Colab. For the training dataset, I have 700 images, mostly 256x256, that I need to upload into a python numpy array for my project. I also have…
Jesse Cambon
  • 355
  • 1
  • 2
  • 11
12
votes
2 answers

Intermediate layer makes tensorflow optimizer to stop working

This graph trains a simple signal identity encoder, and in fact shows that the weights are being evolved by the optimizer: import tensorflow as tf import numpy as np initia = tf.random_normal_initializer(0, 1e-3) DEPTH_1 = 16 OUT_DEPTH = 1 I =…
lurscher
  • 25,930
  • 29
  • 122
  • 185
12
votes
1 answer

Soft margin in linear support vector machine using python

I'm learning support vector machine and trying to come up with a simple python implementation (I'm aware of the sklearn package, just to help understand the concepts better) that does simple linear classification. This is the major material I'm…
Jason
  • 2,950
  • 2
  • 30
  • 50
12
votes
3 answers

All intermediate steps should be transformers and implement fit and transform

I am implementing a pipeline using important features selection and then using the same features to train my random forest classifier. Following is my code. m = ExtraTreesClassifier(n_estimators = 10) m.fit(train_cv_x,train_cv_y) sel =…
Stupid420
  • 1,347
  • 3
  • 19
  • 44
12
votes
1 answer

Validation and Testing accuracy widely different

I am currently working on a dataset in kaggle. After training the model of the training data, I testing it on the validation data and got an accuracy of around 0.49. However, the same model gives an accuracy of 0.05 on the testing data. I am using…