3

I want to extract "some_token" from the "some text :some_token" text.

The code below returns the full match ' :some_token' not the captured part 'some_token' marked with ([a-z0-9_-]+).

import re

let expr = re("\\s:([a-z0-9_-]+)$", flags = {re_study, re_ignore_case})
for match in "some text :some_token".find_bounds(expr):
  echo "'" & match & "'"

How it could be modified to return only the captured part?

P.S.

Also, what's the difference between re and nre modules?

pietroppeter
  • 1,433
  • 13
  • 30
Alex Craft
  • 13,598
  • 11
  • 69
  • 133

1 Answers1

7

The submitted code does not compile (find_bounds returns a tuple[first, last: int] and not something that you can iterate with for). Still, it is true that find_bounds in that examples will give index bounds of the whole pattern and not the capture substring.

The following (https://play.nim-lang.org/#ix=2yvs) works to give the captured string:

import re

let expr = re("\\s:([a-z0-9_-]+)$", flags = {re_study, re_ignore_case})
var matches: array[1, string]
if "some text :some_token".find(expr, matches) >= 0:
  echo matches  # -> ["some_token"]

Note that in the above matches must have the correct length for captured groups (using a sequence will not work unless you specify the correct length). This is a known issue of re: https://github.com/nim-lang/Nim/issues/9472

Regarding the dual existence of re and nre, summarizing from this discussion:

  • nre has a different api (more ergonomic) than re (closer to C API)
  • nre had less issues than re in the past, but the gap has been closed in recent times (see also open regex issues)
  • it might be that in the future nre might be moved out from stdlib and put in a nimble package, but since this has not happened in v1, it probably will not happen before v2
  • note that there is a pure nim implementation of regex (nim-regex) which also has an ergonomic API.
pietroppeter
  • 1,433
  • 13
  • 30