In the case that you are trying to prevent spam words, sentences, such as "fasdhusdhfi", and not anything else, you could always have a database of words and their synonyms. You could then check if the input has less then 50% known words in the database, you could raise a flag. You can make an offline database, which I wouldn't recommend, or you could use some online databases. For a list of words, I would suggest
http://thesaurus.com/
For a list of synonyms of those words, I would suggest
http://www.synonyms.net/
I think these two would probably be the best for said purpose, as they both have an API (for synonyms.net its on this page) you can use, so it doesn't require parsing the returned pages for words.
You could then, in turn, combine this with other methods, as previously stated, such as Bayesian filtering.
While this does not really fit to your AI needs, it does prevent a range of messages.
To fit your 'AI' request, you could probably be able to adapt ALICE's Spam.aiml. It is in AIML format, but contains a lot of permutations of 4-symbol spam. The problem with this is that it is slow.
A possible alternative to Spam.aiml would be to use the rules of the English language to detect spam, and filter it. The following rules could be used:
Every word must have at least one vowel. For this, the letter ‘Y’ is considered a vowel.
No word has more than 3 consonants in a row. For this purpose, ‘TH’ is considered one letter (so as to not mess up on words like 'streNGTH').
No word is longer is longer than 34 letters. The exceptions to this would be the words listed here.
Some letter combinations cannot occur. An example of this would be that the letters ‘R’ and ‘C’ never appear directly beside each other in a regular, non-slang conversation.
You could have a database of impossible combinations. I made a small one by running every permutation of 2-letters against a database containing 6578 words, and came up with these results:
df bf kf gf jk kj sj fj gj hj lj sl
Those are all impossible combinations. Of course, combinations such as 'zz' are omitted. Those are:
aa bb cc dd ee ff gg hh ii jj kk ll mm nn pp qq rr ss tt uu vv ww xx yy zz
'oo' is omitted, as it appears in many words, such as 'look'.
Segments of the string that are longer than 2 characters and repeat consecutively would be flagged as spam. In the string 'lololololol', the repeated segment is 'lo', and is flagged as spam.
More than 3 of the same vowels in the same word would be flagged as spam. For example: 'oooouuuu' would be flagged as spam, as 'o' and 'u' are vowels that have been repeated for longer then 3 times.
No word larger than 1 character may be made up of just vowels. In this case, 'Y' would not be considered a vowel, as to keep from getting a false positive on 'you'.
Any input that does not follow these regulations by 15% or more (margin for misspellings) would be redirected to spam.
If you do decide to modify ALICE's files, you can get alot of them here. Newer version may be found at ALICE's Google Code page.
You could also use a spellchecker to help with spam detection. You could run the input against a spellchecker such as PyEnchant (for Python), and read the suggestions. If the input has no suggestions, then it can be safely assumed, in most cases, that it is spam.
It's not perfect, but it does should to a limited extent. I made a small program to demonstrate what spam filtering like this would result in. This is the output:
>>> fdsahjfsd
'fdsahjfsd' is spam since more than 3 consonants appear in a row
>>> fhsdjhfksd
'fhsdjhfksd' is spam since it has no vowel
>>> jfsdkjl
'jfsdkjl' is spam since it has no vowel
>>> dk
'dk' is spam since it has no vowel
>>> ddds
'ddds' is spam since it has no vowel
>>> uxxs
'uxxs' is not spam
>>> kd
'kd' is spam since it has no vowel
>>> ukd
'ukd' is not spam
>>> asdjaskljlaskjldkasjkljdklas
'asdjaskljlaskjldkasjkljdklas' is spam since it is too long
>>> hdjaskj
'hdjaskj' is spam since invalid sequences detected
As I said before, it's not perfect, as it returns false positives (such as 'uxxs'), but this could be fixed with a spell checking implementation.
The backdraw with a spell checking implementation would be that your spam detection would be based on the amount of words the dictionary has. Most spellchecker only have the first 10,000 words, so some uncommon words may be blocked as spam. However, checking if over 15% of the input is invalid could solve this.
If you think it may help you, you can get the small program I made from here. It's written in Python.
Also, as other answers here have said, a 'state-of-the-art' spam filter would require a mixture of methods.
You can use SpamAssasin, PyZor, Reverend, and Orange, but probably the best thing to do would be to try to combine all of those together.
If you would like to use Lisp for this, a nice article about Bayesian filtering in Lisp is located here.
If you would like to do this via a neural network, then this Codeproject article may be useful. It utilizes a simple and easy to use dll, and the example code can almost directly be used for the task of spam filtering.