-1

I am not a data scientist and very new to data science/ machine learning. My goal is to predict if certain text is of a specific class or not. I have looked naive bays to classify the text in different classes, but here I have only one class. Eventually, I want to predict if the text is of a certain class or not (e.g. if the text is of type technical or not as opposed to text is technical or political). I have only the positive dataset (all texts of type technical) for training.

As far as I know, Naive bays will require both positive as well as the negative dataset for the training. Not sure if this is the best algo to use here for the problem. Would like to learn the better approach if there is any. Thanks.

Sagar
  • 5,315
  • 6
  • 37
  • 66

1 Answers1

0

You have two options:

  1. You can use an autoencoder as the following:

    • Step1: Train it with the positive data you have
    • Step2: Use error calculation as a classifier: provide new data to the autoencoder you have already trained in step one and take the data unit with high error as "anomalies" (don't belong to the wanted text class in your case.)
  2. You can also use a clustering technic like k-means, you will need to spend some more time on feature engineering (choosing the most relevant features of your text) in this case.

Sagar
  • 5,315
  • 6
  • 37
  • 66
Walid Da.
  • 948
  • 1
  • 7
  • 15