0

I am using stax parser to find offset of some tags. The problem is that getCharacterOffset returns int, and my file is quite large (50 millions lines). So getLocation().getCharacterOffset() is getting oveflow and return negative value. What can I use instead?

My code draft.

while (reader.hasNext()) {
    var event = reader.nextEvent();
    if (event.isEndElement()) {
        var endElement = event.asEndElement();
        if (endElement.getName().getLocalPart().equals(tag)) {
            end = endElement.getLocation().getCharacterOffset();
            break;
        }
    }
}
pasha
  • 1
  • Possible duplicate: https://stackoverflow.com/questions/34724494/how-do-i-keep-track-of-parsing-progress-of-large-files-in-stax – rici Oct 10 '21 at 13:33
  • @rici I guess the author of that question wanted only progress of a parsing (which depends on a buffer length) and I want to know exactly an address of an element (like indexOf) – pasha Oct 10 '21 at 16:49
  • In theory, if you detect the overflow, couldn't you calculate the actual offset by the difference from the maximum negative number to how far it gets reduced towards zero, and add that to the real offset, for each overflow that occurs? :-) OK, otherwise maybe look around for a StAX parser implementation which allows for larger offsets (ideally infinite offsets, as long as not running into `String.length()` limits per element/text-node/attribute/etc.)? – skreutzer Dec 26 '22 at 00:22

0 Answers0