0

I am attempting to parse the JMDict_e.xml file from the JMDict project using VTD-XML. However, I am running into a parsing error.

The only error message that appears is:

ParserException: com.ximpleware.EntityException: Errors in Entity: Illegal entity char

A short excerpt from the xml looks like:

<entry>
    <ent_seq>1279770</ent_seq>
    <k_ele>
        <keb>構成要素</keb>
    </k_ele>
    <r_ele>
        <reb>こうせいようそ</reb>
    </r_ele>
    <sense>
        <pos>&n;</pos>
        <pos>&adj-no;</pos>
        <field>&comp;</field>
        <gloss>components</gloss>
        <gloss>elements</gloss>
        <gloss>parts</gloss>
    </sense>
</entry>

I believe that in the pos fields, the illegal characters are likely the ampersands. Is there a way to have vtd-xml to not treat these ampersands as special characters? Or is there a different approach to this problem?

waylonion
  • 6,866
  • 8
  • 51
  • 92
  • 2
    XML doesn't allow ampersands; perhaps if you replace with "&", that will help. I am not familiar with vtd-xml. – arcy Jun 29 '17 at 01:19

1 Answers1

1

VTD-XML only recognizes those built-in character entities. It seems to me that most of the entities are invalid. You probably need to fix those problems before feeding it to the parser.

vtd-xml-author
  • 3,319
  • 4
  • 22
  • 30