1

I'm analyzing some citation patterns and have strings for each sentence that usually look like this

Various chemogenomics [<xref ref-type="bibr" rid="pone.0204999.ref008">8</xref>] methods have been proposed in the last decade [<xref ref-type="bibr" rid="pone.0204999.ref010">10</xref>–<xref ref-type="bibr" rid="pone.0204999.ref024">24</xref>] and also this one [<xref ref-type="bibr" rid="pone.0204999.ref008">8</xref>].

and while trying to find all citations in a sentence I have matches which look like

>10<\/xref>–<xref ref-type="bibr" rid="pone.0204999.ref024">24<\/xref>

ideally I'd be able to identify (and replace) the full reference with square brackets i.e., the string

[<xref ref-type="bibr" rid="pone.0204999.ref010">10</xref>–<xref ref-type="bibr" rid="pone.0204999.ref024">24</xref>]

knowing only the string I have a match for. I'm wondering is it possible to take the match string and include all text up to the nearest square bracket?

I can match up to the right hand square bracket using lazy .*?] regex terms after the match string, but can't do the same for the left hand bracket as it will find the first left hand bracket of the sentence rather than the closest one to the match string.

jmaths
  • 63
  • 4
  • Since the XML (``) stuff isn't really relevant here, I'd suggest to boil the question down to a minimal working example that only includes the actual problem (namely matching text inside square brackets). BTW: You need escape the square brackets in regex by using `\]` instead of just `]` to match the character "]". – koks der drache Aug 21 '19 at 08:45

1 Answers1

2

I'm wondering is it possible to take the match string and include all text up to the nearest square bracket?

Yes, e.g. by excluding the left square bracket from the match, like this:

\[[^\[]*?\]

See minimal example on regex101

koks der drache
  • 1,398
  • 1
  • 16
  • 33
  • Many thanks for the reply, agreed that this will find all citations in the sentence as shown by your example (thanks for that) but if I just want to specifically find the match in question i.e., the one with `>10<\/xref>–24<\/xref>` in the square brackets can this be done? – jmaths Aug 21 '19 at 09:00