0

I have a multi-column data set as follows

Id      Summary        Component       Description      Labels             Action

id1     free-text-11   free-text-12    free-text-13     label1, label2     action1
id2     free-text-11   free-text-22    free-text-23     label2, label3     action2

... so on

Here Summary, Component, Description contains user provided free text in english. Labels and Action columns contains system defined fixed texts. Now my job at hand is to train a model using java which will predict Action value after reading data from other columns - Summary, Component, Description and Labels and here some of the columns can be optional.

As a total newbie, I tried to use LDA using mallet, but all of the examples only handle one free text input column, and also I am not sure which algorithm would be best fit for my use case. So how do I solve this problem using java? Any help would be appreciated.

Anindya Chatterjee
  • 5,824
  • 13
  • 58
  • 82
  • Do you have to do this in Java? Also, do you have any pre-trained (I mean, already tagged) sets of "This text -> this action"? – Tiago Duque Dec 19 '19 at 19:37
  • Yes I have to do it in java and I have training data having action column filled up. – Anindya Chatterjee Dec 20 '19 at 02:57
  • LDA is best for cases when you want to get a sense of the contents of a collection. If all you want is prediction, you're better off just training a classifier. For multiple input fields, you could just concatenate them. If you think the same word has different meaning in different fields, you could prepend the field name to each string, so that `elephant` in a label would become `LABEL_elephant` in a combined input "document". – David Mimno Dec 20 '19 at 13:23

0 Answers0