0

I have a bunch of text, for example:

foofoofooabcdefhjkldh389dn{pdf}images/1.pdf,100%,500{/pdf}hfnkjt8499duidjglkj

I'd like to extract the following:

{pdf}images/1.pdf,100%,500{/pdf}

So here's a regex I made:

#{pdf}(.*?){/pdf}#

When checking the results I get back:

Array
(
[0] => {pdf}images/1.pdf,100%,500{/pdf}
[1] => images/1.pdf,100%,500
)

I expected to only get the first item in the array, but instead there's two items. I'm using PHP and for testing I use the following website: PHP Regex Tester

How can I only obtain the {pdf}...{/pdf} text?

Brian
  • 1,803
  • 1
  • 16
  • 22

3 Answers3

3

your using a group in your regex. in your case the group is

(.*?)

This causes PHP to give you the full result {PDF}sometext{/PDF} and the sometext as found in the first group.

just try the following to get rid of the group:

#{pdf}.*?{/pdf}#
dommel
  • 56
  • 2
1

Use a non capturing group, to ensure the central text doesn't show up as a backreference in the array, and use zero width assertions to ensure the {pdf} part isn't part of the match:

#(?<={pdf})(?:.*?)(?={/pdf})#

If you want to keep the {pdf} delimiters:

#{pdf}(?:.*?){/pdf}#
Asad Saeeduddin
  • 46,193
  • 6
  • 90
  • 139
1

You do not have twor results.

The problem (it is not a problem though) here is that probably a function preg_match is used. This function returns both the whole matching query, that is {pdf}images/1.pdf,100%,500{/pdf}, as well as the final result, that is images/1.pdf,100%,500.

So You only need to use the $result[1] for further parsing.

shadyyx
  • 15,825
  • 6
  • 60
  • 95