29

I was wondering if there is any good and clean object-oriented programming (OOP) implementation of Bayesian filtering for spam and text classification? This is just for learning purposes.

merv
  • 67,214
  • 13
  • 180
  • 245
gyurisc
  • 11,234
  • 16
  • 68
  • 102

6 Answers6

12

I definitely recommend Weka which is an Open Source Data Mining Software written in Java:

Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new machine learning schemes.

As mentioned above, it ships with a bunch of different classifiers like SVM, Winnow, C4.5, Naive Bayes (of course) and many more (see the API doc). Note that a lot of classifiers are known to have much better perfomance than Naive Bayes in the field of spam detection or text classification.

Furthermore Weka brings you a very powerful GUI

Benedikt Waldvogel
  • 12,406
  • 8
  • 49
  • 61
5

Maybe https://ci-bayes.dev.java.net/ or http://www.cs.cmu.edu/~javabayes/Home/node2.html?

I never played with it either.

CloudyMarble
  • 36,908
  • 70
  • 97
  • 130
svrist
  • 7,042
  • 7
  • 44
  • 67
5

Check out Chapter 6 of Programming Collective Intelligence

Binil Thomas
  • 13,699
  • 10
  • 57
  • 70
3

Here is an implementation of Bayesian filtering in C#: A Naive Bayesian Spam Filter for C# (hosted on CodeProject).

Yaakov Ellis
  • 40,752
  • 27
  • 129
  • 174
2

nBayes - another C# implementation hosted on CodePlex

Joel Martinez
  • 46,929
  • 26
  • 130
  • 185
1

In French, but you should be able to find the download link :) PHP Naive Bayesian Filter

Vincent Robert
  • 35,564
  • 14
  • 82
  • 119