-1

I have a dataset with items and features (attributes). Each item has some features.

Total number of features ~400 feature.

I want to rank the features based on their importance. I am not looking for classification, I am looking for features ranking.

I convert the item-feature into a binary matrix like the fowllowing, where 1 means this feature exists in this item and 0 otherwise.

itemID | feature1 | feature2 | feature3 | feature4 .... 1 | 0 | 1 | 1 | 0 2 | 1 | 0 | 0 | 1 3 | 1 | 1 | 1 | 0 4 | 0 | 0 | 1 | 1

An example of real data is for hotels, where features could be something like: Air Condition, Free WiFi, etc.

HotelID | Air Condition| Free WiFi .... 1 | 0 | 1 2 | 1 | 0 3 | 1 | 1 4 | 0 | 0 .....

I need to know what to use and how to use it.

A sample code will be very appreciated

mbayomi
  • 71
  • 1
  • 8

1 Answers1

0

It looks like you are looking for an algorithm such as Information Gain. Taken from the documentation of the class:

Evaluates the worth of an attribute by measuring the information gain with respect to the class

Here you can find a usage example:

http://www.programcreek.com/java-api-examples/index.php?api=weka.attributeSelection.InfoGainAttributeEval

Good luck.

AndreyF
  • 1,798
  • 1
  • 14
  • 25