3

I have a log that includes a lot of paths along with other text. I want to get specific paths from the log without a trailing slash.
How to do it using regex?

For example, the text is:

some text /dir1/dir2/ some text
some text /dir1/dir3 some text

I'd like to get these matches:

/dir1/dir2
/dir1/dir3

I've tried different methods using positive lookahead, like:

\/dir1[^\s]*(?=\/)

But they didn't work. I would appreciate any support.

tkmamedov
  • 31
  • 1

3 Answers3

2

Use

\/dir1(?:\/[^\/\s]+)*

See regex proof.

EXPLANATION

--------------------------------------------------------------------------------
  \/                       '/'
--------------------------------------------------------------------------------
  dir1                     'dir1'
--------------------------------------------------------------------------------
  (?:                      group, but do not capture (0 or more times
                           (matching the most amount possible)):
--------------------------------------------------------------------------------
    \/                       '/'
--------------------------------------------------------------------------------
    [^\/\s]+                 any character except: '\/', whitespace
                             (\n, \r, \t, \f, and " ") (1 or more
                             times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
  )*                       end of grouping
Ryszard Czech
  • 18,032
  • 4
  • 24
  • 37
1

Well based on your definition you're looking for anything with a leading slash in a space separated ensemble. So:

s = 'some text /dir1/dir2/ some text'

print([x for x in s.split() if x[0] == '/'])

Output:

/dir1/dir2/

This will work whichever string you feed in.

Synthase
  • 5,849
  • 2
  • 12
  • 34
-1
\/.*\/\S*

\/ - match forward slash

.* - match any character for infinite amount of times

\/ - match forward slash

/\S* - match any non-whitespace character for infinite amount of times

This should work assuming you always have a whitespace after /dir1/dir2/ or /dir1/dir2

SimpleNiko
  • 159
  • 9