Do not allow ".xml"/".html"/"index" in URI?

Question

I'm going through Lift's basics in Section 3.2 SiteMap of Simply Lift and one thing struck me.

Using the default SiteMap code, you can ask for, say, info view in three ways:

GET /info,
GET /info.html,
GET /info.xml (why?).

What is more, you can request index view in four different ways:

GET /,
GET /index,
GET /index.html,
GET /index.xml.

How can I limit this behaviour to GET / for directories and GET /info for files?

P.S. All of these return 200 OK:

Shouldn't one resource have one URL only?

What's the downside of this behavior? In what way is it bad? — VasiliNovikov, May 20 '13 at 08:01
@VasyaNovikov, in search-engine-robots way perhaps. :) I mean... no robot should come up with `.html`/`.xml` URL, but `/dir/index` and `/dir/` are pretty probable (e.g. autogenerated menu uses `/dir/index` links). — Michal Rus, May 21 '13 at 23:12

score 3 · Accepted Answer · answered May 20 '13 at 02:43

3

There are actually more than four ways that it can be parsed. The full list of known suffixes (any of which can be used to access the page) can be found here.

I think the reason for that is that lift can be used to serve any resource, so most are explicitly added by default.

I think you could disable Lift's processing of all extensions by adding this to Boot.scala:

LiftRules.explicitlyParsedSuffixes = Nil

However, I wouldn't recommend that as there may be some side-effects.

Using Req with RestHelper you can specify the suffix explicitly, but I don't know if there is such a construct to do so with Sitemap.

answered May 20 '13 at 02:43

jcern

7,798
4
39
47

"Some side effects"? That sounds scary. – nafg May 20 '13 at 19:53
This many links to one view, that's interesting. Maybe `SiteMap` implementation should be changed then? As I understand it, it is only used for dynamic HTML pages. Your code works, though (as for suffixes, `/index` question is still open). `Nil` cannot be used for `Set`s, `Set()` or `Set.empty` does the trick (see http://stackoverflow.com/questions/10506226). Thanks! – Michal Rus May 21 '13 at 23:18
The index is a pretty standard convention. Most webservers have a construct for creating a default document (directoryindex in apache, index in nginx, etc...) which internally redirects the request for `/` to the default document - and both respond. I don't think there is a way of disabling that. Lift uses that convention too, and I am pretty sure most search engines account for that. You can find many examples in the wild, like: http://www.nytimes.com and http://www.nytimes.com/index.html or http://www.facebook.com and http://www.facebook.com/index.php – jcern May 22 '13 at 13:59

score 2 · Answer 2 · answered May 20 '13 at 14:53

2

Actually, the code to determine whether Lift should handle the request or not is here. You can see the default extensions in the liftHandled method directly above, but they can all be overridden with LiftRules.liftRequest. Something like:

LiftRules.liftRequest append {
  case r => Full(r.path.suffix.trim == "")
}

Should do the trick.

As far as why it works that way, Jason is right that Lift is designed to handle multiple types of dynamic resource.

answered May 20 '13 at 14:53

Dave Whittaker

3,102
13
14

1) +1 for the link to repo, I guess that's still the best way to get Lift. :) 2) Your code only compiles without the `Full()` wrapper, but it doesn't do the trick. :< 3) Shouldn't multiple types be handled with Accept: header? – Michal Rus May 21 '13 at 23:08

Do not allow ".xml"/".html"/"index" in URI?

2 Answers2