Is there any suggestions or any help in wich way schould i go that you can advise me, to make the segmenting of the simple text in text file during converting it to xml file, such like as was before in xml. I mean, i'm converting text file into xml with jaxp+sax, like this text:
Hello world. I am happy to see you today.
into this xml:
<trans-unit id="1">
<target> Hello world</target>
</trans-unit>
<trans-unit id="2">
<target> I am happy to see you today</target>
</trans-unit>
but if i for example have source xml content that in id="1" has 3 sentences for example:
<trans-unit id="1">
<source> Hello world. Sunny smile. Wake up early.</source>
</trans-unit>
<trans-unit id="2">
<source> I am happy to see you today</source>
</trans-unit>
and wenn i parse text from this xml i become simple text:
Hello world. Sunny smile. Wake up early.I am happy to see you today.
How can i segment this text, during converting it into xml, in order that target xml file can have also 3 sentences again? like:
<trans-unit id="1">
<target> Hello world. Sunny smile. Wake up early.</target>
</trans-unit>
<trans-unit id="2">
<target> I am happy to see you today</target>
</trans-unit>
that is conversion txt->xml:
public void doit() {
try {
in = new BufferedReader(new InputStreamReader(
new FileInputStream(file), "UTF8"));
out = new StreamResult(selectedDir);
initXML();
String str;
while ((str = in.readLine()) != null) {
elements = str.split("\n|((?<!\\d)\\.(?!\\d))");
for (i = 0; i < elements.length; i++)
process(str);
}
in.close();
closeXML();
} catch (Exception e) {
e.printStackTrace();
}
}
public void initXML() throws ParserConfigurationException,SAXException, UnsupportedEncodingException, FileNotFoundException, TransformerException {
// JAXP + SAX
SAXTransformerFactory tf = (SAXTransformerFactory) SAXTransformerFactory.newInstance();
th = tf.newTransformerHandler();
Transformer serializer = th.getTransformer();
serializer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
// XML ausgabe
serializer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4");
serializer.setOutputProperty(OutputKeys.INDENT, "yes");
th.setResult(out);
th.startDocument();
atts = new AttributesImpl();
atts1 = new AttributesImpl();
atts1.addAttribute("", "", "xlmns","CDATA", "urn:oasis:names:tc:xliff:document:1.2");
th.startElement("", "", "xliff", atts1);
th.startElement("", "", "file",null);
th.startElement("", "", "body", null);
}
public void process(String s) throws SAXException {
try {
atts.clear();
k++;
atts.addAttribute("", "", "id", "", "" + k);
th.startElement("", "", "trans-unit", atts);
th.startElement("", "", "target", null);
th.characters(elements[i].toCharArray(), 0, elements[i].length());
th.endElement("", "", "target");
th.endElement("", "", "trans-unit");
}
catch (Exception e) {
System.out.print("Out of bounds!");
}
}
public void closeXML() throws SAXException {
th.endElement("", "", "body");
th.endElement("", "", "file");
th.endElement("", "", "xliff");
th.endDocument();
}