0

I am using Staford classifier for an NLP related task. But I need to try out other machine learning algorithms as well. So I would like to convert the prop file to arff or print the features and then convert into an arff file. One major concern is that the features output by the classifier tool is of sparse representation (show only features that are present). How to achieve this?

Amrith Krishna
  • 2,768
  • 3
  • 31
  • 65

1 Answers1

1

There is a sparse format for ARFF. It is very similar to non-sparse ARFF files, but data with value 0 are not be explicitly represented.

Sparse ARFF files have the same header (i.e @relation and @attribute tags) but the data section is different. Instead of representing each value in order, like this:

@data
0, X, 0, Y, "class A"
0, 0, W, 0, "class B"

the non-zero attributes are explicitly identified by attribute number and their value stated, like this:

@data
{1 X, 3 Y, 4 "class A"}
{2 W, 4 "class B"}

Note this problem about arff sparse format.

Community
  • 1
  • 1
greeness
  • 15,956
  • 5
  • 50
  • 80
  • In Stanford classifier suppose for generating Ngram for a sentence, the whole sentence is given say in column1. then the features will be represented as `1-set1Ngram 1-nextSetNgram 1-nextAgain` and so on till all the ngrams are formed. But I suppose that each of this set forms a different feature in arff format, and so the format should be, say `{ 1 set1Ngram, 2 nextSetNram, 3 nextAgain} rather than {1 set1Ngram, 1 nextSetNram, 1 nextAgain}` – Amrith Krishna Jan 21 '14 at 09:38