3

i have a csv file with the following format
productname, review of a the product

now using mallet i have to train the classifier so that if a test dataset is given as input which contains product reviews, it should tell me to which product a particular review belongs to

mallet java api help will be appreciated

1 Answers1

8

Here is a little example suited to your case:

    public static void main(String[] args) throws IOException {
        //prepare instance transformation pipeline
        ArrayList<Pipe> pipes = new ArrayList<Pipe>();
        pipes.add(new Target2Label());
        pipes.add(new CharSequence2TokenSequence());
        pipes.add(new TokenSequence2FeatureSequence());
        pipes.add(new FeatureSequence2FeatureVector());
        SerialPipes pipe = new SerialPipes(pipes);

        //prepare training instances
        InstanceList trainingInstanceList = new InstanceList(pipe);
        trainingInstanceList.addThruPipe(new CsvIterator(new FileReader("datasets/training.txt"), "(.*),(.*)", 2, 1, -1));

        //prepare test instances
        InstanceList testingInstanceList = new InstanceList(pipe);        
        testingInstanceList.addThruPipe(new CsvIterator(new FileReader("datasets/testing.txt"), "(.*),(.*)", 2, 1, -1));

        ClassifierTrainer trainer = new NaiveBayesTrainer();
        Classifier classifier = trainer.train(trainingInstanceList);
        System.out.println("Accuracy: " + classifier.getAccuracy(testingInstanceList));
   }
  • Hello, please in my case i have to do the traning with txt file .. How can change the code please ? Thanks a lot. Best Regards – researcher Oct 17 '13 at 19:10
  • @researcher as this code makes use of CSVIterator it will work with txt file . Fileiterator can be used to train through directory structure – drp Sep 07 '16 at 06:29