1

I'm trying to remove a set of child elements from a parent element using VTD-XML.

Unfortunately after removing an element, it leaves behind the new line that the removed element previously occupied. This behaviour is also observed by a reader of an article on VTD-XML by the VTD-XML author here. I'm trying to work out how to remove this new line.

I managed to achieve a modicum of success by manipulating the length value stored in the underlying 64-bit VTD token to cover the new line character (additional 2 bytes). Code snippet is as follows:

// XMLModifier modifier
modifier.remove(vn.getElementFragment()+0x200000000L);

I've tested that this works well on the old_cd.xml provided in ex_16 of the VTD-XML Examples.

However when I try this same approach on my working file, a ModifyException error is thrown when I attempt to call modifier.output(), specifically it is thrown by modifier.check2().

Questions

1. Why would the above approach cause check2() to fail? I don't think I'm overflowing the bits on the VTD token, file is < 2MB. See Update.
2. Is there a better approach to remove the remaining new line?

I'm still fairly new to VTD-XML so I would greatly appreciate any advice and insight and learn from more experienced users.

Thanks for your help.

Update
Wow, in the process of writing this question I realise that I forgot to consider the different character encodings and updating the adjusting long value to 1 byte fixed the check2() problem! (another reason to take the time to pause and rethink/write out the problem).

I'd still like to know from more experienced users if there are better approaches to this.

Community
  • 1
  • 1
xlm
  • 6,854
  • 14
  • 53
  • 55
  • Is this a show stopper? Did you manage to fix the issue? – vtd-xml-author Jul 06 '13 at 04:26
  • Not a show stopper, I did fix the issue but I was wondering if there was a better way to remove new lines with VTD-XML without having to resort to directly manipulating the bits of underlying token. I'll add details of my approach as an answer if there are no substantive answers. – xlm Jul 09 '13 at 06:41
  • This has to be done at the api level possibly by adding new methods. Notice that there can be multiple new line characters before and/or after the element segment.. the new methods have to take care of that too. – vtd-xml-author Jul 11 '13 at 03:47
  • Yup, agreed that's what I had to do. I wrote a very simple utility to update the offset and length bits using the getCurrentDepth() call. I didn't check for overflow though but that shouldn't be problematic for most use cases. Also might be nice if the static ints of encodings in VTDNav were placed in a map for quick lookup of charset. – xlm Jul 11 '13 at 07:08
  • 2.12 will add a method called expandWhiteSpace to VTDNav's core api, it will take a 64 bit an return a 64-bit int. – vtd-xml-author Aug 31 '13 at 02:21

1 Answers1

0

To answer your question, I think this needs to be done at the API level and it needs to take care a few extra details, like the options to remove all surrounding white spaces or none of the white spaces. It needs to be done in the next release...

vtd-xml-author
  • 3,319
  • 4
  • 22
  • 30