-2

I have 3 datasets in csv problem.csv with attributes:

  1. id
  2. level
  3. accuracy
  4. solved_count
  5. error_count
  6. tag1
  7. tag2
  8. tag3
  9. tag4
  10. tag5

Submission.csv with attibutes:

  • user_id
  • problem_id
  • solved_status

user.csv with

  • user_id
  • solved_count
  • attempts

I want to now predict on a test dataset that whether user will be able to solve a problem or not.

I was thinking of applying Naive Bayes Classification. But i don't know how to approach this problem. I suppose i have to make a common dataset in arff for use with Weka or sckit learn. Give me some idea of how i can approach this problem.

R. Max
  • 6,624
  • 1
  • 27
  • 34

1 Answers1

1

If you want to use a weka, you should join all data sets together. To get one data set with attributes as follows:

  • user_id
  • id
  • level
  • accuracy
  • solved_count
  • error_count
  • tag1
  • tag2
  • tag3
  • tag4
  • tag5
  • solved_count
  • attempts
  • solved_status (this will be your class)

After this work you have to load the data set to weka explorer or in java code. You have to build a classifier based on your data set. Then you can predict your new instance where the solved_status will be empty.

bolec_kolec
  • 500
  • 3
  • 13
  • You can different classifier. The most common are j48, random forest, naive bayes, knn, svn. Check all of them and take the one which get best results. – bolec_kolec Jan 31 '16 at 10:03