0

I'm trying to perform dimensionality reduction on a classification problem with two classes.

I have 6 csv files. My code is here:

def linear_discrimination_analysis(files):
    with open(os.path.join("/Users", "byname", "PycharmProjects", "sensorLogProject", "Data", files[0]),
          'rU') as my_file_0:
        df1 = sd.sample_difference(my_file_0)
    for f in files[1:len(files) - 2]:
        with open(os.path.join("/Users", "myname", "PycharmProjects", "sensorLogProject", "Data", f),
                  'rU') as my_file_1:
            df1.append(sd.sample_difference(my_file_1))
    with open(os.path.join("/Users", "myname", "PycharmProjects", "sensorLogProject", "Data", files[len(files) - 2]),
                  'rU') as my_file_2:
        df2 = sd.sample_difference(my_file_2)
    with open(os.path.join("/Users", "myname", "PycharmProjects", "sensorLogProject", "Data", files[len(files) - 1]),
                  'rU') as my_file_3:
        df2.append(sd.sample_difference(my_file_3))
    X = df1[['x', 'y', 'z']].values
    y = df2['label'].values
    lda = LDA(n_components=1)
    lda.fit_transform(X, y.ravel())
    plt.show()

linear_discrimination_analysis(files)

I suppose that could be the issue.

Here's the error I get:

RuntimeWarning: invalid value encountered in divide S**2 [:self._max_components]

Each file has 100's of rows and 5 columns. I want to use the 3rd, 4th, and 5th column for feature extraction which are called 'x', 'y', and 'z'.

1 Answers1

0

The error basically refer to your y which has a bad input shape (464, 3). For your posted linear discriminant analysis, you are suppose to provide one output which is the "class" (you can have multiple classes though) and your y should be a 1D array. Though I'm not sure what you want to do and why you need 3 columns for y...

Y. Luo
  • 5,622
  • 1
  • 18
  • 25
  • I see. So y should be the labels then? –  Jun 14 '17 at 18:06
  • How do I set up training and testing sets then? –  Jun 14 '17 at 18:06
  • When I change `y` to `y = [1, 2]` (I have two class labels) I now get `ValueError: Found input variables with inconsistent numbers of samples: [504, 2]` –  Jun 14 '17 at 18:12
  • @dirtysocks45 Yes, `y` should be the label for LDA here. I'm sure you know what LDA is. But I didn't see enough backgrounds of you problem. Like what are the objects? What are the features `X`? What are your classification, `y`? That's why I'm saying I don't know what you are trying to do. – Y. Luo Jun 14 '17 at 18:13
  • Hmm, I assumed X was supposed to be the preprocessed data and not the features. I'm trying to extract features. The `x`, `y`, and `z` columns are accelerometer data for two gestures. `y` is the label for either gesture. I have 3 examples of each for a total of 6 files. –  Jun 14 '17 at 18:16
  • @dirtysocks45 I'm even more confused about what you are trying to do. The new error is expected. It basically says you have 504 samples and each has 2 features provided. Your class label `y` has only 2 samples. The expected y should be 504 in length by 1D only (a simple list rather than a list of lists). Not sure if I explained it well... – Y. Luo Jun 14 '17 at 18:16
  • Ok so I the last column in the files is a column called `labels`. This has 504 samples as well. I changed it to `y = df[['label']].values` but now I get `DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel(). y = column_or_1d(y, warn=True)` –  Jun 14 '17 at 18:22
  • I thought `.values` returned a 1d array. EDIT: I fixed it by using `y.ravels()` –  Jun 14 '17 at 18:23
  • @dirtysocks45 Glad you solve it. That's a warning and you should still be able to get the result. `.values` gives a correct 1d array with `y.ndim` being 1 if that is from 1 row but gives a 2d array with `y.ndim` being 2 if that is from 1 column. It is a column-vector which can still be used (warning not error) because there is only 1 elements on the 2nd axis (like a list of lists). – Y. Luo Jun 14 '17 at 18:37