I have developed a regular expression to identify a block of xml inside a text file. The expression looks like this (I have removed all java escape slashes to make it read easy):
<\?xml\s+version="[\d\.]+"\s*\?>\s*<\s*rdf:RDF[^>]*>[\s\S]*?<\s*\/\s*rdf:RDF\s*>
Then I optimised it and replaced [\s\S]*?
with .*?
It suddenly stopped recognising the xml.
As far as I know, \s
means all white-space symbols and \S
means all non white-spaced symbols or [^\s]
so [\s\S]
logically should be equivalent to .
I didn't use greedy filters, so what could be the difference?