how to filter a word in regex?

Question

I am new to regex and I need to filter only the starting and the ending breaks from the following line:

<br/><br/><br/><br/><br/><p>hello<br/>asdsadas</p><br/><br/><br/><br/><br/>

the regex I am using is this:

^[<br/>]+|[<br/>]+$

but this gives me the following result:

p>hello<br/>asdsadas</p

my required result is this:

<p>hello<br/>asdsadas</p>

can anyone tell me where I am getting it wrong? thanks in advance.

You got wrong at point when you started using regexp ... see http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags — SergeS, Sep 11 '13 at 08:25
@SergeS I dont understand that answer, also it has been locked. let me know if you know how to do it. an example would be well appreciated. if you dont know, let us all wait for the wise guy to turn up . lol — Ikthiander, Sep 11 '13 at 08:31
@Jon i just want to filter out that line to get the required result. i will use it in a framework, but that is not relevant. — Ikthiander, Sep 11 '13 at 08:33
@SergeS: I agree with not using regex for parsing DOM in general, but not here. Trimming a specific substring (the `
`s are not really parsed) from the ends of the input should not require DOM. — Jon, Sep 11 '13 at 08:33
@Ikthiander: That's "where" and "why", not "how". I meant "which function are you calling and with what arguments?". — Jon, Sep 11 '13 at 08:34
Thanks for all the efforts guys, i got the answer, now lets move on... — Ikthiander, Sep 11 '13 at 08:42
@Jon: I seen many specific uses, which later were grown to monsters, because there were other requirements. XML parser can handle this string. Also this looks like there is some parsing or other manipulation before. — SergeS, Sep 11 '13 at 08:42

score 2 · Accepted Answer · answered Sep 11 '13 at 08:28

2

Technically your regexp is searching for any of this chars <, >, b, r, / independedly, correct regexp is

^(<br/>)+|(<br/>)+$

But as I menotioned in comment, try to use DOM/XML parser instead of regexp (Javascript have one, or directly using DOM)

answered Sep 11 '13 at 08:28

SergeS

11,533
3
29
35

i am using java, hence regex would suffice, i am going to accept your answer, i just need to wait 3 more minutes. but looks like you were the wise guy who solved it. well done. – Ikthiander Sep 11 '13 at 08:36

score 1 · Answer 2 · answered Sep 11 '13 at 08:29

1

Regex isn't the preferred method for selecting html. But anyway, give this a try:

\<p\>(.*)?\<\/p\>

or whatever is between the
can be something else than a paragraph?

answered Sep 11 '13 at 08:29

Christophe

4,798
5
41
83

score 0 · Answer 3 · answered Sep 11 '13 at 08:30

If you are using preg_match to isolate the middle (interesting) fragment of the input, the correct exression is

^(?:<br/>)*(.*?)(?:<br/>)*$

This treats the sequence <br/> as a single token, while using angle brackets as in your example means "any of the characters <, b, r, /, >" -- which is why you are losing angle brackets from your <p> tags.

how to filter a word in regex?

3 Answers3