0

I want to use batch learning PR to conduct text classification in GATE. I firstly write this configure XML and it can work.

<?xml version="1.0"?>
<ML-CONFIG>
  <VERBOSITY level="1"/>
  <SURROUND value="false"/>
  <PARAMETER name="thresholdProbabilityClassification" 
      value="0.5"/>
  <multiClassification2Binary method="one-vs-others"/>
  <EVALUATION method="kfold" 
       runs="5"
       ratio="0.66" />
  <ENGINE nickname="PAUM" 
   implementationName="PAUM"
   options=" -p 50 -n 5 -optB 0.0  "/>
  <DATASET>
    <INSTANCE-TYPE>emotion</INSTANCE-TYPE>
    
    <NGRAM>
      <NAME>ngram</NAME>
      <NUMBER>1</NUMBER>
      <CONSNUM>4</CONSNUM>
      
      <CONS-1>
        <TYPE>Token</TYPE>
        <FEATURE>string</FEATURE>
      </CONS-1>
   
   <CONS-2>
        <TYPE>word_bag</TYPE>
        <FEATURE>feature</FEATURE>
      </CONS-2>
   
   <CONS-3>
        <TYPE>hashtag</TYPE>
        <FEATURE>feature</FEATURE>
      </CONS-3>
   
    <CONS-4>
        <TYPE>Token</TYPE>
        <FEATURE>category</FEATURE>
      </CONS-4>
   <WEIGHT>2</WEIGHT>
    </NGRAM>
    
    <ATTRIBUTE>
      <NAME>Class</NAME>
      <SEMTYPE>NOMINAL</SEMTYPE>
      <TYPE>emotion</TYPE>
      <FEATURE>feature</FEATURE>
      <POSITION>0</POSITION>
      <CLASS/>
    </ATTRIBUTE>
    
  </DATASET>
</ML-CONFIG>

However, when I change the order of CONS, like the following, It doesn't work.

<?xml version="1.0"?>
<ML-CONFIG>
  <VERBOSITY level="1"/>
  <SURROUND value="false"/>
  <PARAMETER name="thresholdProbabilityClassification" 
      value="0.5"/>
  <multiClassification2Binary method="one-vs-others"/>
  <EVALUATION method="kfold" 
       runs="5"
       ratio="0.66" />
  <ENGINE nickname="PAUM" 
   implementationName="PAUM"
   options=" -p 50 -n 5 -optB 0.0  "/>
  <DATASET>
    <INSTANCE-TYPE>emotion</INSTANCE-TYPE>
    
    <NGRAM>
      <NAME>ngram</NAME>
      <NUMBER>1</NUMBER>
      <CONSNUM>4</CONSNUM>
        
   <CONS-1>
        <TYPE>word_bag</TYPE>
        <FEATURE>feature</FEATURE>
      </CONS-1>
   
   <CONS-2>
        <TYPE>hashtag</TYPE>
        <FEATURE>feature</FEATURE>
      </CONS-2>
   
    <CONS-3>
        <TYPE>Token</TYPE>
        <FEATURE>category</FEATURE>
      </CONS-3>
  
   <CONS-4>
        <TYPE>Token</TYPE>
        <FEATURE>string</FEATURE>
      </CONS-4>


   
   <WEIGHT>2</WEIGHT>
    </NGRAM>
    
    <ATTRIBUTE>
      <NAME>Class</NAME>
      <SEMTYPE>NOMINAL</SEMTYPE>
      <TYPE>emotion</TYPE>
      <FEATURE>feature</FEATURE>
      <POSITION>0</POSITION>
      <CLASS/>
    </ATTRIBUTE>
    
  </DATASET>
</ML-CONFIG>

However, the last one can be loaded into GATE and every time I run the batch learning PR, there goes the following error information:

java.lang.NullPointerException at gate.learning.NLPFeaturesOfDoc.writeNLPFeaturesToFile(NLPFeaturesOfDoc.java:818) at gate.learning.LightWeightLearningApi.annotations2NLPFeatures(LightWeightLearningApi.java:198) at gate.learning.EvaluationBasedOnDocs.oneRun(EvaluationBasedOnDocs.java:388) at gate.learning.EvaluationBasedOnDocs.kfoldEval(EvaluationBasedOnDocs.java:197) at gate.learning.EvaluationBasedOnDocs.evaluation(EvaluationBasedOnDocs.java:118) at gate.learning.LearningAPIMain.execute(LearningAPIMain.java:776) at gate.util.Benchmark.executeWithBenchmarking(Benchmark.java:291) at gate.creole.ConditionalSerialController.runComponent(ConditionalSerialController.java:163) at gate.creole.SerialController.executeImpl(SerialController.java:157) at gate.creole.ConditionalSerialAnalyserController.executeImpl(ConditionalSerialAnalyserController.java:225) at gate.creole.ConditionalSerialAnalyserController.execute(ConditionalSerialAnalyserController.java:132) at gate.util.Benchmark.executeWithBenchmarking(Benchmark.java:291) at gate.gui.SerialControllerEditor$RunAction$1.run(SerialControllerEditor.java:1728) at java.lang.Thread.run(Unknown Source)

Does anyone have any idea for this problem?

Thanks a lot!

Fan Yang
  • 23
  • 3

1 Answers1

0

I would suggest you to make sure that document caused this issue really produces features defined in you configuration XML file. Because I see that you used Token, I think that document is empty.

ashingel
  • 494
  • 3
  • 11