How do I get the last directory from a URL path using a Zeus rewrite rule?

Question

I need a regular expression that will return the last directory in a path.

e.g, from www.domain.com/shop/widgets/, return "widgets".

I have an expression that almost works.

[^/].*/([^/]+)/?$

It will return "widgets" from www.domain.com/shop/widgets/ but not from www.domain.com/widgets/

I also need to ignore any URLs that include a filename. So that www.domain.com/shop/widgets/blue_widget.html will not match.

This must be done using regular expressions as it is for the Zeus server request rewrite module.

`$what_i_want = (split "/", $url)[-1]` would also get you the answer. — Unos, Apr 01 '12 at 07:32
@freespace Did you read the OP's post? "This must be done using perl regular expressions as it is for the Zeus server request rewrite module." — Amber, Apr 01 '12 at 17:24
Are you using the Perl Extensions for ZWS, and what have you tried so far for your rewrite rule? — brian d foy, Apr 01 '12 at 20:41

score 2 · Answer 1 · answered Apr 01 '12 at 15:42

/^www\.example\.com\/([^\/]+\/)*([^\/]+)\/$/

What does this do?

Matches normal text for the domain. Adjust this as required.
Matches any number of directories, each of which consists of non-slash characters followed by a slash.
Matches a string of non-slashes.
Matches a slash at the end of the input, thus eliminating files (since only directories end in a slash).

Implemented in Perl:

[ghoti@pc ~] cat perltest
#!/usr/local/bin/perl

@test = (
        'www.example.com/path/to/file.html',
        'www.example.com/match/',
        'www.example.com/pages/match/',
        'www.example.com/pages/widgets/thingy/',
        'www.example.com/foo/bar/baz/',
);

foreach (@test) {
        $_ =~ m/^www\.example\.com\/([^\/]+\/)*([^\/]+)\/$/i;
        printf(">> %-50s\t%s\n", $_, $2);
}

[ghoti@pc ~] ./perltest
>> www.example.com/path/to/file.html                    
>> www.example.com/match/                               match
>> www.example.com/pages/match/                         match
>> www.example.com/pages/widgets/thingy/                thingy
>> www.example.com/foo/bar/baz/                         baz
[ghoti@pc ~]

I didn't downvote this, but it doesn't seem to work for www.domain.com/shop/widgets/blue_widget.html case. — BluesRockAddict, Apr 02 '12 at 00:21
Works for me. When I include the `blue_widget.html` line, it's treated the same as my `file.html` example -- that is, `$2` remains unset. How did you test? — ghoti, Apr 02 '12 at 01:35
Sorry ghoti, I've misread the original question. Your answer is correct. — BluesRockAddict, Apr 02 '12 at 04:14

score 1 · Answer 2 · answered Apr 01 '12 at 07:06

1

This should generally work:

/([^/.]+)/$

It matches a set of non-slash, non-period characters after the second-to-last slash in a string that must end in a slash.

The "folder name" will be in the first capture group.

answered Apr 01 '12 at 07:06

Amber

507,862
82
626
550

6

What about `http://www.example.com/hier/archy?f=1&y=zz/qq#frag/ment`? The last "folder" here is likely `archy`. Or perhaps `qq` or even `ment`, depending on how the URL is used. – James Youngman Apr 01 '12 at 07:31
2

@JamesYoungman [a] The OP's appears to be using trailing slashes for their URLs. [b] Most URL rewrite engines on the server level (what the OP was asking about) don't include query strings, and the server NEVER sees the fragment. [c] If they really want the trailing slash to be optional, they can just add a `?` after the last `/` in the pattern. – Amber Apr 01 '12 at 17:21
(There's also the fact that almost all of the other upvoted answers here do the same thing, and most of them less efficiently. Not sure why all the objections to this one specifically.) – Amber Apr 01 '12 at 17:26
I didn't downvote this, but it doesn't seem to work for www.domain.com/shop/widgets/blue_widget.html case. – BluesRockAddict Apr 02 '12 at 00:21
@BluesRockAddict - which is exactly what the OP requested. (Ignoring URLs that end in a filename, not a folder.) – Amber Apr 02 '12 at 00:39
@Amber, you're correct I've misread the question. My apologies. – BluesRockAddict Apr 02 '12 at 04:13

Ωmega · Answer 3 · 2012-04-02T12:39:29.477

1

#!/usr/bin/perl

use strict;
use warnings;

$_ = 'www.domain.com/shop/widgets/';
print "$1\n" if (/\/([^\/]+)\/$/);

$_ = 'www.domain.com/shop/widgets/blue_widget.html';
print "$1\n" if (/\/([^\/]+)\/$/);'

edited Apr 02 '12 at 12:39

answered Apr 01 '12 at 15:15

Ωmega

42,614
34
134
203

This actually works for all cases presented by OP (i.e. www.domain.com/shop/widgets/blue_widget.html and www.domain.com/shop/widgets/). – BluesRockAddict Apr 02 '12 at 00:22
@BluesRockAddict Except that the OP wanted it to NOT match URLs that ended in a filename - not match them and return the folder. *"I also need to ignore any URLs that include a filename"* – Amber Apr 02 '12 at 00:40

score 0 · Answer 4 · edited May 23 '17 at 12:30

You don't want a Perl regular expression. You want a regular expression that Zeus will understand. Although they might call that PCRE, not even PCRE handles all Perl regular expressions.

Most of the answers here are wrong because they aren't thinking about the different sorts of URLs that you will can get as input.

Get just the path portion of the URL
Match against the path portion to find what you need
Distinguish between paths that end in a filename and those that don't

There are some examples that you can use as a start. I don't use Zeus and don't want to, so the next part is up to you:

I've read that you can pass the request to a Perl program through Perl Extensions for ZWS, but I'd be surprised if you needed to do that. If you have to resort to that, I'd use the URI module to parse the URI and extract the path. Once you have that, split up the path into it's components:

use URI;

my $uri = URI->new( ... ); # I don't know how Zeus passes data
my $path = $uri->path;

# undef to handle the leading /
my( undef, @parts ) = split $path, '/';

Once you are this far, you have to decide how you want to recognize something as a directory. If you're mapping directly onto a filesystem structure, that is just a matter of popping elements off @parts until you find the directories, then counting back the number you want to skip.

However, I cringe at doing that, no matter what I put in the Perl program. I'd try really hard to get it done just in the Zeus rules first. Show us what you have so far.

How do I get the last directory from a URL path using a Zeus rewrite rule?

4 Answers4