I'm analyzing some citation patterns and have strings for each sentence that usually look like this
Various chemogenomics [<xref ref-type="bibr" rid="pone.0204999.ref008">8</xref>] methods have been proposed in the last decade [<xref ref-type="bibr" rid="pone.0204999.ref010">10</xref>–<xref ref-type="bibr" rid="pone.0204999.ref024">24</xref>] and also this one [<xref ref-type="bibr" rid="pone.0204999.ref008">8</xref>].
and while trying to find all citations in a sentence I have matches which look like
>10<\/xref>–<xref ref-type="bibr" rid="pone.0204999.ref024">24<\/xref>
ideally I'd be able to identify (and replace) the full reference with square brackets i.e., the string
[<xref ref-type="bibr" rid="pone.0204999.ref010">10</xref>–<xref ref-type="bibr" rid="pone.0204999.ref024">24</xref>]
knowing only the string I have a match for. I'm wondering is it possible to take the match string and include all text up to the nearest square bracket?
I can match up to the right hand square bracket using lazy .*?]
regex terms after the match string, but can't do the same for the left hand bracket as it will find the first left hand bracket of the sentence rather than the closest one to the match string.