2

I need a regular expression that will return the last directory in a path.

e.g, from www.domain.com/shop/widgets/, return "widgets".

I have an expression that almost works.

[^/].*/([^/]+)/?$ 

It will return "widgets" from www.domain.com/shop/widgets/ but not from www.domain.com/widgets/

I also need to ignore any URLs that include a filename. So that www.domain.com/shop/widgets/blue_widget.html will not match.

This must be done using regular expressions as it is for the Zeus server request rewrite module.

brian d foy
  • 129,424
  • 31
  • 207
  • 592
Matt
  • 251
  • 1
  • 5
  • 9

4 Answers4

2
/^www\.example\.com\/([^\/]+\/)*([^\/]+)\/$/

What does this do?

  • Matches normal text for the domain. Adjust this as required.
  • Matches any number of directories, each of which consists of non-slash characters followed by a slash.
  • Matches a string of non-slashes.
  • Matches a slash at the end of the input, thus eliminating files (since only directories end in a slash).

Implemented in Perl:

[ghoti@pc ~] cat perltest
#!/usr/local/bin/perl

@test = (
        'www.example.com/path/to/file.html',
        'www.example.com/match/',
        'www.example.com/pages/match/',
        'www.example.com/pages/widgets/thingy/',
        'www.example.com/foo/bar/baz/',
);

foreach (@test) {
        $_ =~ m/^www\.example\.com\/([^\/]+\/)*([^\/]+)\/$/i;
        printf(">> %-50s\t%s\n", $_, $2);
}

[ghoti@pc ~] ./perltest
>> www.example.com/path/to/file.html                    
>> www.example.com/match/                               match
>> www.example.com/pages/match/                         match
>> www.example.com/pages/widgets/thingy/                thingy
>> www.example.com/foo/bar/baz/                         baz
[ghoti@pc ~] 
ghoti
  • 45,319
  • 8
  • 65
  • 104
1

This should generally work:

/([^/.]+)/$

It matches a set of non-slash, non-period characters after the second-to-last slash in a string that must end in a slash.

The "folder name" will be in the first capture group.

Amber
  • 507,862
  • 82
  • 626
  • 550
  • 6
    What about `http://www.example.com/hier/archy?f=1&y=zz/qq#frag/ment`? The last "folder" here is likely `archy`. Or perhaps `qq` or even `ment`, depending on how the URL is used. – James Youngman Apr 01 '12 at 07:31
  • 2
    @JamesYoungman [a] The OP's appears to be using trailing slashes for their URLs. [b] Most URL rewrite engines on the server level (what the OP was asking about) don't include query strings, and the server NEVER sees the fragment. [c] If they really want the trailing slash to be optional, they can just add a `?` after the last `/` in the pattern. – Amber Apr 01 '12 at 17:21
  • (There's also the fact that almost all of the other upvoted answers here do the same thing, and most of them less efficiently. Not sure why all the objections to this one specifically.) – Amber Apr 01 '12 at 17:26
  • I didn't downvote this, but it doesn't seem to work for www.domain.com/shop/widgets/blue_widget.html case. – BluesRockAddict Apr 02 '12 at 00:21
  • @BluesRockAddict - which is exactly what the OP requested. (Ignoring URLs that end in a filename, not a folder.) – Amber Apr 02 '12 at 00:39
  • @Amber, you're correct I've misread the question. My apologies. – BluesRockAddict Apr 02 '12 at 04:13
1
#!/usr/bin/perl

use strict;
use warnings;

$_ = 'www.domain.com/shop/widgets/';
print "$1\n" if (/\/([^\/]+)\/$/);

$_ = 'www.domain.com/shop/widgets/blue_widget.html';
print "$1\n" if (/\/([^\/]+)\/$/);'
Ωmega
  • 42,614
  • 34
  • 134
  • 203
  • This actually works for all cases presented by OP (i.e. www.domain.com/shop/widgets/blue_widget.html and www.domain.com/shop/widgets/). – BluesRockAddict Apr 02 '12 at 00:22
  • @BluesRockAddict Except that the OP wanted it to NOT match URLs that ended in a filename - not match them and return the folder. *"I also need to ignore any URLs that include a filename"* – Amber Apr 02 '12 at 00:40
0

You don't want a Perl regular expression. You want a regular expression that Zeus will understand. Although they might call that PCRE, not even PCRE handles all Perl regular expressions.

Most of the answers here are wrong because they aren't thinking about the different sorts of URLs that you will can get as input.

  • Get just the path portion of the URL
  • Match against the path portion to find what you need
  • Distinguish between paths that end in a filename and those that don't

There are some examples that you can use as a start. I don't use Zeus and don't want to, so the next part is up to you:

I've read that you can pass the request to a Perl program through Perl Extensions for ZWS, but I'd be surprised if you needed to do that. If you have to resort to that, I'd use the URI module to parse the URI and extract the path. Once you have that, split up the path into it's components:

use URI;

my $uri = URI->new( ... ); # I don't know how Zeus passes data
my $path = $uri->path;

# undef to handle the leading /
my( undef, @parts ) = split $path, '/';

Once you are this far, you have to decide how you want to recognize something as a directory. If you're mapping directly onto a filesystem structure, that is just a matter of popping elements off @parts until you find the directories, then counting back the number you want to skip.

However, I cringe at doing that, no matter what I put in the Perl program. I'd try really hard to get it done just in the Zeus rules first. Show us what you have so far.

Community
  • 1
  • 1
brian d foy
  • 129,424
  • 31
  • 207
  • 592