0

I am attempting to classify how "good" short work reports are using fast text text classification. At this stage I made only one label "interfering behavior" which I' calling __label__int, because I just want to see if it will work. I want to compare texts with how closely they match sentences taken from good reports. I made my own training text document - a sample of which is:

__label__int Aggression data are low and stable at occurrences.
__label__int Elopement frequency has decreased to occurrences.
__label__int Property destruction data are low and stable at occurrences.
__label__int Non-compliance frequency is stagnant at occurrences.
__label__int Tantrum duration is low and stable at minutes.
__label__int Aggression frequency is on an increasing trend.
__label__int Crying percentage is on a decreasing trend.
__label__int Elopement frequency is on a decreasing trend.

and my code I have written is:

import fasttext

model = fasttext.train_supervised(input = 'Interfering Behavior Train.txt')
model.save_model("model_int-behavior.bin")

print_results(*model.test("test_valid.txt"))

but I keep getting the following output:

Read 0M words Number of words: 94 Number of labels: 1 N 0 P@1 nan R@1 nan Progress: 100.0% words/sec/thread: 12881 lr: 0.000000 avg.loss: 0.000000 ETA: 0h 0m 0s

text_valid.txt is one of the files I know has these terms in it, so im expecting a good comparison. I could not find anything online about how to write the custom labeled data sets. Is there an issue with maybe my training data? too many words? Or is my code incomplete?

SSerb1989
  • 81
  • 7
  • You are doing a supervised learning with just one label, what is your objective? Sorry I cannot make it from the post, please try to improve the question. The way you have labeled the data is correct. Can you also provide what `test_valid.txt` contains? – a11apurva Sep 26 '21 at 17:23
  • It is really hard to understand the problem statement - `I am attempting to classify how "good" short work reports are using fast text text classification.` – a11apurva Sep 26 '21 at 17:31
  • @a11apurva I appologize - I agree it is vague (defined by my wife's boss). Im basically trying to classify a good report by how closely it resembles these sentences. Alot of reports are incomplete, or dont use these terms and say stuff like "client was good" and thats what im trying to filter for – SSerb1989 Sep 26 '21 at 17:40
  • 1
    As your question is very vague, I would suggest you to look at some tutorials, there are quite a few simple ones on Medium which can help you with fasttext. Also, read the official documentation of fasttext. – a11apurva Sep 26 '21 at 18:14
  • 1
    Generally, you can't do straightforward 'classification' without at least 2 classes. (Yes, there are some techniques for identifying texts that are some margin "outside" a single set of positive examples, but their use/tuning/applicability is quite a bit more arcane & limited than the more typical training of a classifier when you have constrasting examples.) Ideally, you'd add to training a lot of text that's from a similar context as your 'good' examples, but which a competent evaluator marked as 'not-good' (or even just, 'not-good-enough'). – gojomo Sep 26 '21 at 18:22
  • 1
    Then, both FT supervised mode (& other online guides to classification) should start to behave more usefully with respect to your problems. In a pinch, you could conceivably use arbitrary sentences chosen from some other corpus as your 'not-good' texts - anything from generic reference text (like Wikipedia) to other bulk text from a similar domain (which looks like something related to pediatrics/child-psych). Such text, simply by not being explicitly 'good', **might** provide a useful contrast with your 'good' examples. – gojomo Sep 26 '21 at 18:26
  • 1
    @gojomo thank you so much! there is going to be at least 4 classes for this "project" - i was basically just trying to do a simple test first with one class first - but it makes sense that it would work better with more. So i will try that next! – SSerb1989 Sep 26 '21 at 19:22

0 Answers0