I am a newbies to Mallet, I am trying use mallet Simple tagger/CRF and experimenting with phrases - I tried lookup the documentation on mallet site and also went through the user archives - nothing helped.
I tried training mallet for simple tagging, Its works resonable well.. Here is how my data looks like (Pls note there is a newline between the training to indicate they are different set)
Sample training data:
where STOPWORD
is STOPWORD
chicago CITY
<---Newline---->
Sunnyvale CITY
<---Newline---->
Chicago CITY
<---Newline---->
Washington CITY
<---Newline---->
What STOPWORD
is STOPWORD
Sunnyvale CITY
time ASK
<---Newline---->
new STOPWORD
<---Newline---->
place STOPWORD
The problem I have is when city names are multi words, Say
new york CITY
Pls note that in the above training data "new" is a STOPWORD Questions
- For Simple tagger, Is the above representation fine ? If not how do I represent pharses ?
- If not how to represent data such that SimpleTagger/CRF can use the previous 'n' words to arrive at a tag ? i.e kind of chunk my input