Regex matching between curly brackets yields too many results

Question

I have a bunch of text, for example:

foofoofooabcdefhjkldh389dn{pdf}images/1.pdf,100%,500{/pdf}hfnkjt8499duidjglkj

I'd like to extract the following:

{pdf}images/1.pdf,100%,500{/pdf}

So here's a regex I made:

#{pdf}(.*?){/pdf}#

When checking the results I get back:

Array
(
[0] => {pdf}images/1.pdf,100%,500{/pdf}
[1] => images/1.pdf,100%,500
)

I expected to only get the first item in the array, but instead there's two items. I'm using PHP and for testing I use the following website: PHP Regex Tester

How can I only obtain the {pdf}...{/pdf} text?

score 3 · Accepted Answer · answered Nov 16 '12 at 11:43

3

your using a group in your regex. in your case the group is

(.*?)

This causes PHP to give you the full result {PDF}sometext{/PDF} and the sometext as found in the first group.

just try the following to get rid of the group:

#{pdf}.*?{/pdf}#

answered Nov 16 '12 at 11:43

dommel

56
2

Asad Saeeduddin · Answer 2 · 2012-11-16T11:48:15.877

1

Use a non capturing group, to ensure the central text doesn't show up as a backreference in the array, and use zero width assertions to ensure the {pdf} part isn't part of the match:

#(?<={pdf})(?:.*?)(?={/pdf})#

If you want to keep the {pdf} delimiters:

#{pdf}(?:.*?){/pdf}#

edited Nov 16 '12 at 11:48

answered Nov 16 '12 at 11:40

Asad Saeeduddin

46,193
6
90
139

score 1 · Answer 3 · answered Nov 16 '12 at 11:44

You do not have twor results.

The problem (it is not a problem though) here is that probably a function preg_match is used. This function returns both the whole matching query, that is {pdf}images/1.pdf,100%,500{/pdf}, as well as the final result, that is images/1.pdf,100%,500.

So You only need to use the $result[1] for further parsing.

Regex matching between curly brackets yields too many results

3 Answers3