Imputer(missing_values = "NaN", strategy = "mean", axis = 0)
The above line creates an Imputer object which will impute/replace the missing values which are denoted as NaN's with the mean value of the values.
impt = impt.fit(X[:,1:3])
So it needs some data from which it can calculate mean
which can be replaced by the missing values. This is normally done by a method fit
which will calculate the values needed, mean in this case. The fit
takes in some data to calculate these values and it is normally called the training
phase.
impt.transform(X[:,1:3])
Once the values are calculated they can be used on the new data presented to it. In this case, it will replace the missing data with the calculated (in fit
method ) mean. This is done via a transform
method.
Sometimes one might want to run fit
and transform
of the same data. In such cases instead of calling fit followed by transform we can use fit_transform
method.
X[:,1:3] = impt.fit_transform(X[:,1:3])