3

I am running string-match using the pattern [ \[\]a-zA-Z0-9_:.,/-]+ to match a sample text Text [a,b]. Although the pattern works on regex101, when I run it on scheme it returns #f. Here is the regex101 link.

This is the function I am running

(string-match "[ \\[\\]a-zA-Z0-9_:.,/-]+" "Text [a,b]")

Why isn't it working on scheme but works eleswhere? Am I missing something?

xabush
  • 849
  • 1
  • 13
  • 29

2 Answers2

3

After discussing the issue on the guile gnu mailing list, I found out that Guile's (ice-9 regex) library uses POSIX extended regular expressions. And this flavor of regular expression doesn't support escaping in character classes [..], hence that's why it wasn't matching the strings.

However, I used the following function as a workaround and it works:

(string-match "[][a-zA-Z]+" "Text[ab]")

xabush
  • 849
  • 1
  • 13
  • 29
2

I don't see anything wrong with your regular expression syntax as it is quoted correctly so I assume there must be a bug in Guile, or the regexp library it uses, where \] just isn't interpreted the correct way inside brackets. I found a workaround by using the octal code point values instead:

(string-match "[A-Za-z\\[\\0135]+" "Text [a,b]")
; ==> #("Text [a,b]" (0 . 4))

Your regular expression isn't very good. It matches any combination of those chars so "]/Te,3.xt[2" also matches. If you are expecting a string like "Something [something, something]" I would rather have made /[A-Z][a-z0-9]+ [[a-z0-9]+,[a-z0-9]+]/ instead. eg.

(define pattern "[A-Z][a-z0-9]+ \\[[a-z0-9]+,[a-z0-9]+\\]") 
(string-match pattern "Test [q,w]")     ; ==> #("Test [q,w]" (0 . 10))
(string-match pattern "Be100 [sub,45]") ; ==> #("Be100 [sub,45]" (0 . 14))
Sylwester
  • 47,942
  • 4
  • 47
  • 79
  • 1
    running `(match:substring (string-match "[ A-Za-z\\[\\0135]+" "Text [a,b]"))` returns `Text [a,b` and still doesn't match the `]`. Also since, I am trying to match long chemical names the order of the characters doesn't matter – xabush Jun 16 '19 at 15:59