1

I'm a beginner in machine learning, and I'm trying to use a data set to train a log linear classifier. The data set contains five features, and each feature is a vector, but the dimension of the features are different. The dimensions are 3, 1, 6, 2, and 2 respectively. I tried PCA method to reduce the dimensions to 1 with scikit-learn, but it didn't works well. So how do I process the features to fit a log linear classifier model like logistic regression?

SDC1215
  • 21
  • 4
  • Welcome to StackoverFlow. Please [take the tour](https://stackoverflow.com/tour), read about [how to ask good questions](https://stackoverflow.com/help/how-to-ask) and learn about [how to create a Minimal, Complete and Verifiable Example](https://stackoverflow.com/help/mcve). – Gaurang Dave Apr 13 '18 at 03:27

1 Answers1

0

A simple way to do this is just to flatten all of your features. And then feed it into your classifier.

An example:

features = [... 
          [[0, 1 3], [5], [2, 6, 4, 7, 8, 9], [1, 0], [0, 1]], #for one sample
          ...]

Use a list comprehension to flatten each list inside features:

flattened_features = [[i for k in f for i in k] for f in features]

This will turn features into something like this:

    flattened_features
    [... 
    [0,1,3,5,2,6,4,7,8,9,1,0,0,1], #for one sample
    ...]

Now you can convert this into a numpy array and feed it into your model.

Primusa
  • 13,136
  • 3
  • 33
  • 53
  • Hi Primusa, It might be a naive question: after flattening the features, we will have a single vector as above. In doing so, how can the model separate all these are separate features? isn't it same as a single one? Is there any other way of dealing such features? – hemanta Mar 18 '19 at 02:41
  • @hemanta If you use the above method, ideally the model will learn these groupings on its own. There are ways to deal with different length features - with sklearn you can try dimensionality reduction, but I'm not too sure if it otherwise supports these kinds of inputs. If you're open to testing out neural networks though, they can support these kinds of inputs, but ultimately you're still going to merge them in some way within the network, but the initial states are seen and processed by the net. – Primusa Mar 18 '19 at 03:20
  • Hi Primusa, thank you so much for your explanation. I was asking this because I have two feature matrix of different size one of 4X61, and second of 6X3. Now I want to make a feature vector from these two. As you mentioned, I can flatten them and get flatten_tensor for each. Now is it good to concatenate them and use as a single feature vector or there are other methods such as pooling you suggest? However, I don't know how to perform pooling yet. Thanks again for the clarification. – hemanta Mar 18 '19 at 04:01
  • Depends on what your model is trying to do, different merge types do different things, see https://stackoverflow.com/questions/49990882/which-merge-layers-to-use-in-keras/50401718 – Primusa Mar 18 '19 at 12:37