Note: Just to clear the confusion, I have a parsed XML as String that I would like to apply regex against. Mention of XML in my question simply refer to parsed XML string.
I have a XML string processed (PARSED) by Java 7's TransformerFactory
with indentation (i.e. eq=4) enabled. I need to replace all the whitespaces (in group of 4) before the xml tag with a tab (i.e. 1 tab = 4 whitespace, if 8 whitespace then 2 tabs and so on).
The objective is to make sure that the regex do not match the value of the attributes XML tag. Some XML tag's attributes contain one or more whitespaces. So far, all regex+es that I have tried either match all whitespaces or none. I have even tried some +ve/-ve lookahead/lookbehind and no luck (not good with regex).
As shown below the sample regex matches all whitespaces
I have tried a bunch of regex expressions
( {4}) //matches everywhere
^(\s{4})+ //for 12 whitespace, the first 8 is full match, not good
(?<![\d])( {4}) //only -ve/lookbehind 1 space not enough
Here is the https://regex101.com/r/VR4Nbf/2 for regex101
The TransformerFactory
config is as follow:
transformer = transformerFactory.newTransformer();
transformer.setOutputProperty(OutputKeys.METHOD, "xml");
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4");
The above Transformer
configuration work as expected. There is no way to tell Transformer to use tab instead of whitespace
unless you override the required methods which seems to be overkill if regex is possible.
While I do understand that the in XML specification whitespace is considered okay, the way the XML file is used in my case requires the XML to be beautified
with tabs (not whitespaces).
An ideal regex would
- Do not match anything inside xml tag
- Match each occurrence of 4 whitespace with 1 tab (i.e. I used replaceAll)
- Is Java-based
- Preferably can be used with
replaceAll
- Have to be applied once, rather than repetitively (irrespective of level of nesting)
Note: Making use of XSLT is not feasible at this stage.
Thanks.