4

The examples I've seen for Encog Framework describe how to classify numeric data to textual data. For instance, the following is the data used for the classic Iris dataset:

"sepal_l","sepal_w","petal_l","petal_w","species"
5.1,3.5,1.4,0.2,Iris-setosa
4.9,3.0,1.4,0.2,Iris-setosa
4.7,3.2,1.3,0.2,Iris-setosa
4.6,3.1,1.5,0.2,Iris-setosa
5.0,3.6,1.4,0.2,Iris-setosa
5.4,3.9,1.7,0.4,Iris-setosa

Are there any simple examples of how to normalize and use text via Encog, for instance, data like this (?):

"subject","body","Spam"
"This is the subject","This is the body",1
"This message is not spam","Lorem ipsum dolor",0
etc...
Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
user1477388
  • 20,790
  • 32
  • 144
  • 264
  • 1
    http://ijcsi.org/papers/IJCSI-10-2-1-17-21.pdf Classifiers work on numeric data. There are numerous approaches, such as the one above, to select features from text, but they are all detailed algorithms, there is no simple approach i know of. – Steve Feb 12 '14 at 10:42
  • 2
    Thanks for the link. I already know how to do this in Python but there are some methods of sklearn and other packages that facilitate converting the text to numbers and getting rid of stop words and other helpful things like that. I was wondering if, specifically, the **Encog Framework** had any such support of if I would have to custom program it. – user1477388 Feb 12 '14 at 14:20

1 Answers1

1

Encog does not provide any higher level abstractions for encoding pure-text at this time. The inputs to Encog neurons are simply the numeric values that are calculated against the weight matrix. Encog can encode categorical text values.

JeffHeaton
  • 3,250
  • 1
  • 22
  • 33