1

I am new to regex and I need to filter only the starting and the ending breaks from the following line:

<br/><br/><br/><br/><br/><p>hello<br/>asdsadas</p><br/><br/><br/><br/><br/>

the regex I am using is this:

^[<br/>]+|[<br/>]+$

but this gives me the following result:

p>hello<br/>asdsadas</p

my required result is this:

<p>hello<br/>asdsadas</p>

can anyone tell me where I am getting it wrong? thanks in advance.

dhein
  • 6,431
  • 4
  • 42
  • 74
Ikthiander
  • 3,917
  • 8
  • 37
  • 54
  • 1
    You got wrong at point when you started using regexp ... see http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – SergeS Sep 11 '13 at 08:25
  • How exactly are you using it? – Jon Sep 11 '13 at 08:27
  • @SergeS I dont understand that answer, also it has been locked. let me know if you know how to do it. an example would be well appreciated. if you dont know, let us all wait for the wise guy to turn up . lol – Ikthiander Sep 11 '13 at 08:31
  • @Jon i just want to filter out that line to get the required result. i will use it in a framework, but that is not relevant. – Ikthiander Sep 11 '13 at 08:33
  • @SergeS: I agree with not using regex for parsing DOM in general, but not here. Trimming a specific substring (the `
    `s are not really parsed) from the ends of the input should not require DOM.
    – Jon Sep 11 '13 at 08:33
  • @Ikthiander: That's "where" and "why", not "how". I meant "which function are you calling and with what arguments?". – Jon Sep 11 '13 at 08:34
  • Thanks for all the efforts guys, i got the answer, now lets move on... – Ikthiander Sep 11 '13 at 08:42
  • @Jon: I seen many specific uses, which later were grown to monsters, because there were other requirements. XML parser can handle this string. Also this looks like there is some parsing or other manipulation before. – SergeS Sep 11 '13 at 08:42

3 Answers3

2

Technically your regexp is searching for any of this chars <, >, b, r, / independedly, correct regexp is

^(<br/>)+|(<br/>)+$

But as I menotioned in comment, try to use DOM/XML parser instead of regexp (Javascript have one, or directly using DOM)

SergeS
  • 11,533
  • 3
  • 29
  • 35
  • i am using java, hence regex would suffice, i am going to accept your answer, i just need to wait 3 more minutes. but looks like you were the wise guy who solved it. well done. – Ikthiander Sep 11 '13 at 08:36
1

Regex isn't the preferred method for selecting html. But anyway, give this a try:

\<p\>(.*)?\<\/p\>

or whatever is between the
can be something else than a paragraph?

Christophe
  • 4,798
  • 5
  • 41
  • 83
0

If you are using preg_match to isolate the middle (interesting) fragment of the input, the correct exression is

^(?:<br/>)*(.*?)(?:<br/>)*$

This treats the sequence <br/> as a single token, while using angle brackets as in your example means "any of the characters <, b, r, /, >" -- which is why you are losing angle brackets from your <p> tags.

Jon
  • 428,835
  • 81
  • 738
  • 806