0

Could someone help me to convert these lines to Java code, instead of using terminal?

I'm trying to train my own model using Stanford Ner:

java -cp stanford-ner.jar edu.stanford.nlp.process.PTBTokenizer jane-austen-emma-ch1.txt > jane-austen-emma-ch1.tok

perl -ne 'chomp; print "$_\tO\n"' jane-austen-emma-ch1.tok > jane-austen-emma-ch1.tsv

java -cp stanford-ner.jar edu.stanford.nlp.ie.crf.CRFClassifier -prop austen.prop

And must the training file be in .tsv format???

dario
  • 5,149
  • 12
  • 28
  • 32
M A
  • 15
  • 3
  • 1
    Welcome to stackoverflow. Can you show us some java code you've already written and where you're stuck? We're here to help you, not to replace you... – Jörn Hees Jun 09 '15 at 17:08
  • Yes. I'm just concerned with this line: java -cp stanford-ner.jar edu.stanford.nlp.ie.crf.CRFClassifier -prop austen.prop this is my code: Properties prop = new Properties(); prop.load(new FileInputStream(new File("austen.prop"))); CRFClassifier crf = new CRFClassifier(prop); crf.train(); in the properties file: I have all parameters correct but Always I got this error: deftab720= Unknown property: |deftab720| tf1ansiansicpg1252cocoartf1265cocoasubrtf210= tf1ansiansicpg1252cocoartf1265cocoasubrtf210| – M A Jun 09 '15 at 18:03
  • Please post the contents of the file `austen.prop`. It looks like it contains some bad property (which isn't in the `austen.prop` file published on the Stanford NER site). – Jon Gauthier Jun 09 '15 at 18:12
  • trainFile = 45_N_22_E.tsv serializeTo = mahmoud-model.ser.gz map = msentencenum=0,word=1,mindex=2,mstart=3,mend=4,mlemma=5,answer=6 type=crf useClassFeature=true useWord=true useSentenceNum=true useIndex=true useStart=false useEnd=false useLemma=true This is the content of my properties file. Please note that I updated to classes : NERFeatureFactory.java , SeqClassifierFlags.java, AnnotationLookup.java, CoreAnnotations.java to fit with my specified properties above. – M A Jun 09 '15 at 18:31
  • e.g. for the property "msentencenum=0" I have the following line in SeqClassifierFlags.java: Boolean useSentenceNum = false; Also in the if-statement, it has a line. For the class AnnotationType.java, I have the following: MSENTENCENUM_KEY(CoreAnnotations.MSentenceNum.class, "msentencenum") For the class CoreAnnotations.java, I have the following: public static class MSentenceNum implements CoreAnnotations{ public class getType(){ return Integer.class;}} For the class NERFeatureFactory.java : if (flags.useSentenceNum){featuresCpC.add(c.get(MSentenceNum.class)+"-MSM");} – M A Jun 09 '15 at 18:45

0 Answers0