1

I've build a web app using PHP which gets a number of posts containing a keyword like 'pizza' from Instagram and loads them up with some other data to MongoDB.

One of the modules is Python's NLTK and here's how I invoke it (yes, I am switching languages like crazy [just to study them]):

$foo = exec("python tokenize.py $bar");

Now the line above is fully working, but I am looking for a similar way to call my SpamAssassin to check if the content of the Instagram post is a spam or not. Judging from the documentation of SA, I know it is possible to check some plain text files as they were mails like here. I am new to SA though.

Probably this question is pretty simple for advanced SA users, but I cannot tell any input-output option in SA from cmd like in the PHP-to-Python call like above. Assuming that $string is the content of the Instagram post, I am looking for some script like that:

$score_of_SA = exec("spamassassin.exe $string")

Is any script like that possible in PHP? If not, what do I have to do to check that content?

Assume my SA is updated and trained.

Community
  • 1
  • 1
James Pond
  • 267
  • 1
  • 4
  • 18
  • Supply a temporary filename instead of the raw text, or pipe in a file `sa < msg.txt`, or use `popen()` to pipe to its stdin directly. – mario Aug 27 '15 at 04:33

1 Answers1

1

Doubtful. Spamassassin isn't a generic text checker, it builds its scores from source emails by looking at a variety of known email related factors, like whether the sending server has a valid MX record or SPF or DKIM messages.

None of this would be related to non-email structures. One of the core components in regards to text analysis is the implementation of a Bayesian filter.

There is hope however! And a solution that is much more integrated into a PHP project. As it happens there is a php Bayesian spam filter library in Packagist: See here.

You do have to train a spam filter and this library is no exception.

gview
  • 14,876
  • 3
  • 46
  • 51