1

I tried the following call in R and expected 'CCC' matched because it is supposed to be a greedy matching,

str_view('ACCC','C{0,3}')

but nothing matched. However the following call works fine ('A' is removed, then 'CCC' is matched)

str_view('CCC','C{0,3}')

Is this a bug of stringr::str_view? Or I misunderstand something?

Penguin Bear
  • 67
  • 1
  • 2
  • What do you mean by "nothing matched"? There actually is a match and it's the beginning of the string (where you can find 0 `C` characters, as allowed by your pattern). Compare your line with `str_view_all('ACCC','C{0,3}')` and `str_view('CACCC','C{0,3}')`. – nicola Sep 11 '17 at 21:40
  • I cannot reproduce your issue. Also, if you allow the pattern to have 0 repetitions of C, it will match 'nothing' to any string... – Damiano Fantini Sep 11 '17 at 21:45
  • 1
    @d.b There is no inconsistency. The *first* pattern is matched (note that OP used `str_view` and not `str_view_all`). The engine looks at the beginning of the string and it matches the pattern (there are 0 `C`)! So no further look is needed. See my examples in the first comment. `str_view_all('ACCC','C{0,3}')` gets *two* matches: the beginning of the string and `CCC` (look at the beginning of the string which is highlighted). `str_view('CACCC','C{0,3}')` returns just `C`, since the pattern is found. Consider that `str_detect("","C{0,3}")` returns `TRUE`. – nicola Sep 11 '17 at 21:52
  • 1
    @d.b In `"ACCC"` there is only one match of `"C{1,3}"`... where you see the first and the second? See the output of `str_count('ACCC','C{1,3}')` and `str_count('ACCC','C{0,3}')` (regarding the latter, there are actually *three* matches: also the end of the string is a match, which I forgot to mention in my previous comment). – nicola Sep 12 '17 at 04:42
  • @d.b yes, my question is why str_view('ACCC','C{0,3}') and str_view('ACCC','C{1,3}') are not consistent. because they both should be greedy. – Penguin Bear Sep 12 '17 at 05:14
  • @PenguinBear, I guess it is not inconsistent after all. `str_view` will show the first match. Your pattern `"C{0,3}"` means that anything between no `C` (right at the beginning of the string) and 3 `C`s are a positive match. `str_view` will simply highlight the first positive match it can find. It may be greedy but only for the first match. It will not go beyond that. – d.b Sep 12 '17 at 13:40
  • @d.b got you, thanks a lot!!! just realize that is why we have string::str_view_all, thanks a lot. – Penguin Bear Sep 12 '17 at 16:12

1 Answers1

2

In short: no, it's not a bug.

str_view('ACCC','C{0,3}') returns <>A<CCC><>, where each pair of angle brackets represents a match.

Because the pattern can match C zero times, it matches before the A and at the end of the line. The middle match is because it finds three Cs, and it greedily matches all of them.

For more information on how str_view works, read the documentation.

Mark
  • 7,785
  • 2
  • 14
  • 34