As the file is already in XML format, you could just use Java SE builtin JAXB API for this. No need for 3rd party libraries or to dive in another new learning curve with XPath. It also doesn't care about the file extension. All it needs is just an InputStream
of the file.
First create a JAXB javabean class which conforms the XML document structure:
import javax.xml.bind.annotation.XmlAccessType;
import javax.xml.bind.annotation.XmlAccessorType;
import javax.xml.bind.annotation.XmlElement;
import javax.xml.bind.annotation.XmlRootElement;
@XmlRootElement(name="DOC")
@XmlAccessorType(XmlAccessType.FIELD)
public class Doc {
@XmlElement(name="DOCNO")
private Integer docNo;
@XmlElement(name="DOCTYPE")
private String docType;
@XmlElement(name="TXTTYPE")
private String txtType;
@XmlElement(name="AUTHOR")
private String author;
@XmlElement(name="DATE") // You could use a custom adapter if you want java.util.Date.
private String date;
@XmlElement(name="TEXT")
private String text;
// Add/generate getters, setters and other javabean boilerplate.
}
Then you can parse it as follows:
JAXBContext jaxb = JAXBContext.newInstance(Doc.class);
InputStream input = new FileInputStream("/path/to/your/file.txt");
Doc doc = (Doc) jaxb.createUnmarshaller().unmarshal(input);
System.out.println(doc.getDocNo());
System.out.println(doc.getDocType());
// ...