0

While applying some LDA on my Churn_Modelling.csv file, eveything goes well until the point where my X_train return (8000, 1) except of (8000, 2) as expected :

lda = LDA(n_components = 2)

X_train = lda.fit_transform(X_train, y_train)

X_train is before-hand "hot-encoded" and "feature scaled" as followed :

# LDA

# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Importing the dataset
dataset = pd.read_csv('Churn_Modelling.csv')
X = dataset.iloc[:, 3:13].values
y = dataset.iloc[:, 13].values

# Encoding categorical data
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_X_1 = LabelEncoder()
X[:, 1] = labelencoder_X_1.fit_transform(X[:, 1])
labelencoder_X_2 = LabelEncoder()
X[:, 2] = labelencoder_X_2.fit_transform(X[:, 2])
onehotencoder = OneHotEncoder(categorical_features = [1])
X = onehotencoder.fit_transform(X).toarray()
X = X[:, 1:]

# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

# Applying LDA
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA
lda = LDA(n_components = 2)
X_train = lda.fit_transform(X_train, y_train)
X_test = lda.transform(X_test)

While doing the same on an other .csv file I have no troubles... do you have any idea why ?

Thank you very very much for your help !

Jesuisme
  • 1,805
  • 1
  • 31
  • 41
Cédric
  • 21
  • 3

1 Answers1

2

I think I have the answer but I would prefer to have confirmation if possible :-)

The maximal number of columns I can hope to obtain using transform. is n-1 so, in my case, 2 classes (True, False) yields maximally 1 column (n-1).

Am I right ? Thank you again.

Cédric
  • 21
  • 3
  • 1
    Indeed, that was the case. I've just changed the first value of the churn column (containing only "1"s and "0"s) from my dataset by an arbitrary "2" just for the sake of testing. X_train returned indeed a matrix 8000 by 2 (and not 1 anymore) – Cédric Apr 11 '18 at 14:54