1

I have .txt files in the following format:

<DOC>
    <DOCNO> 123456 </DOCNO>
    <DOCTYPE> MISCELLANEOUS </DOCTYPE>
    <TXTTYPE> CAPTION </TXTTYPE>
    <AUTHOR> MICHAEL </AUTHOR>
    <DATE> 1.1.2012 </DATE>
    <TEXT>
    Some Text
    </TEXT>
</DOC>

How can I access tags in these .txt files using Java? I want to know if there is a way to directly access tags rather than reading the .txt file line by line.

BalusC
  • 1,082,665
  • 372
  • 3,610
  • 3,555
vhelsing
  • 31
  • 3

3 Answers3

3

As the file is already in XML format, you could just use Java SE builtin JAXB API for this. No need for 3rd party libraries or to dive in another new learning curve with XPath. It also doesn't care about the file extension. All it needs is just an InputStream of the file.

First create a JAXB javabean class which conforms the XML document structure:

import javax.xml.bind.annotation.XmlAccessType;
import javax.xml.bind.annotation.XmlAccessorType;
import javax.xml.bind.annotation.XmlElement;
import javax.xml.bind.annotation.XmlRootElement;

@XmlRootElement(name="DOC")
@XmlAccessorType(XmlAccessType.FIELD)
public class Doc {

    @XmlElement(name="DOCNO")
    private Integer docNo;

    @XmlElement(name="DOCTYPE")
    private String docType;

    @XmlElement(name="TXTTYPE")
    private String txtType;

    @XmlElement(name="AUTHOR")
    private String author;

    @XmlElement(name="DATE") // You could use a custom adapter if you want java.util.Date.
    private String date;

    @XmlElement(name="TEXT")
    private String text;

    // Add/generate getters, setters and other javabean boilerplate.
}

Then you can parse it as follows:

JAXBContext jaxb = JAXBContext.newInstance(Doc.class);
InputStream input = new FileInputStream("/path/to/your/file.txt");
Doc doc = (Doc) jaxb.createUnmarshaller().unmarshal(input);
System.out.println(doc.getDocNo());
System.out.println(doc.getDocType());
// ...
BalusC
  • 1,082,665
  • 372
  • 3,610
  • 3,555
2

This looks very much like like XML. There are a truckload of utilities that you can use to parse these. So, the work has already been done for you!

Simply search for "java xml parser".

Alternatively, here's a list you can investigate:

  • jdom
  • woodstox
  • xom
  • dom4j
  • vtd-xml
  • xerces-j
  • crimson
Jaco Van Niekerk
  • 4,180
  • 2
  • 21
  • 48
1

Try a normal XML parser. saxon is a good one.

Jayan
  • 18,003
  • 15
  • 89
  • 143