0

So I want to match the case where I have 2 times mysubtag within a mytag. So:

...<mytag>.....<mysubtag>...</mysubtag>..<mysubtag>...</mysubtag>....</mytag>...

where ... are random symbols including newlines.

I don't want to match 2 mytags with each 1 subtag

...<mytag>.....<mysubtag>...</mysubtag>....</mytag>...<mytag>.....<mysubtag>...</mysubtag>....</mytag>...

Is there a trick to it?

i do the matching in eclipse and what I got so far is:

<mytag[\s\S]*?<mysubtag[\s\S]*?</mysubtag>[\s\S]*?<mysubtag[\s\S]*?</mysubtag>[\s\S]*?</mytag>

but this matches the second example

EDIT So bottom line: don't use regexp for these tasks I started a new thread there Xpath find files for windows? xml parser to find files in windows

Community
  • 1
  • 1
Toskan
  • 13,911
  • 14
  • 95
  • 185
  • 2
    Search the site better dude, XML + regexp is poison. – ThomasRS Feb 22 '12 at 15:40
  • [\s\S]*? means "space or non-space", which means "everything". (use .*?) – Scott Weaver Feb 22 '12 at 17:52
  • 1
    You made a slight mistake, you wrote "regex" instead of "xpath" (a mistake that is quite common). A possible XPath expression is "//mytag[count(mysubtag)=2]". – Hauke Ingmar Schmidt Feb 22 '12 at 22:35
  • @sweaver2112 : no, the dot does not match newlines. – Toskan Mar 02 '12 at 13:09
  • @Toskan: it will if you set the regex options to multiline, which you should be doing if you have newlines in your data. – Scott Weaver Mar 02 '12 at 20:40
  • @sweaver2112 so you want me to turn on the multiline option in eclipse? well there's no such option there... you can do it in regexp, but then again you cannot turn it back off if you require. Or normally, in the programs i used you cannot turn it back on – Toskan Mar 06 '12 at 17:47

1 Answers1

0

Using something built to parse XML, this is much easier. This XPath expression should match the XML constellation you're after.

//mytag[count(mysubtag)=2]
ohaal
  • 5,208
  • 2
  • 34
  • 53
  • 1
    Illegal regex syntax. The expression: `[^(?:<\/mysubtag>)]*` does not match what you think it does - hint: it does _not_ match: _"zero or more chars that are not ''"_ Please go back and review the subject of [regular expression character classes](http://www.regular-expressions.info/charclass.html). (i.e. It is a syntax error to put a group inside of a char class.) – ridgerunner Feb 22 '12 at 16:44
  • @ridgerunner: +1 Thank you for the heads up! Corrected the answer. :) – ohaal Feb 22 '12 at 17:27
  • @ridgerunner: Out of curiosity, is there any way to do what I was thinking? – ohaal Feb 22 '12 at 17:57
  • you're assuming there's no other tags whatsoever in his data, but there assuredly are, and they would cause this to not match. – Scott Weaver Feb 22 '12 at 17:59
  • remove the things about regexp in your answer and i'll accept it eventually. The thing which is still unanswered to me is which tool I can use to say browse all subdirectories of /mfolder/ for occurrences of the xpath expression – Toskan Mar 02 '12 at 13:08
  • @Toskan: You're trying to do this in eclipse, the editor, not in a specific programming language? Also, you might want to change your question title if you don't want regex answers anymore :) – ohaal Mar 02 '12 at 15:28