0

I am trying to make a Scikit Learn transformer which will change my input matrices X and Y.

For example, I want to remove all the first rows of X that contain only zeros and remove those same rows in the Y matrix. I'm guessing I should implement that in the fit function but how do I manage it so that the further transformations down the pipeline see those transformed X and Y matrices?

Similarly, if I have a matrix X with rows x_1, x_2, ..., x_n and a matrix Y with rows y_1, y_2, ..., y_n, I would like to transform the matrix X so that I get a new matrix X' with rows for a given k

  • x_1 x_2 x_3 ... x_k
  • x_2 x_3 x_4 ... x_(k+1)
  • ...
  • x_(n-k+1) ... x_n

(each bullet point is a row in the new matrix X'). X' has now fewer rows than X and I would like Y to behave the same i.e. have Y' have rows y_k up to y_n. I am not asking how to transform the matrix X as such but how I can arrange so that the matrices X' and Y' are passed down the pipeline?

I have read this link and it was said that Scikit did not yet implement the changing of matrices. Is it still the case now?

Thank you! :)

  • The general behavior of a transformer is it will call fit on the labeled training data, then transform on all data (labeled and unlabeled) that is passed through. While it is common to remove columns (feature selection), it is unusual to remove rows. Removing rows in a transformer feels like a bad idea as it will apply to unlabeled data as well, making it difficult to identify which predictions correspond to rows of your unlabeled set. – David Maust Oct 24 '15 at 20:08
  • Thank you for your reply! That makes sense. –  Nov 11 '15 at 03:18

0 Answers0