1

I'm archiving a phpBB forum into flat HTML files, without any PHP code anymore. I used wget (see How to: Archive a phpBB forum using Wget and preserve styling), and I now have these files:

enter image description here

How to make Apache serve example.com/forum/viewforum.php?f=2&start=25 as a file, and not as a request to viewforum.php with a query string? The latter does not work obviously and gives a 404.

I already tried this htaccess with no success:

RemoveHandler .php .phtml .php3
RemoveType .php .phtml .php3
php_flag engine off

Note: this is how I archived the forum:

wget -m -p -np -R "*sid=*,ucp.php*,memberlist.php*,*mode=viewprofile*,*view=print*,viewonline.php*,search.php*,posting.php*" https://forums.example.com
Basj
  • 41,386
  • 99
  • 383
  • 673
  • 2
    Whosoever voted to close this question, please read [this meta post](https://meta.stackoverflow.com/questions/283057/mod-rewrite-questions-getting-migrated-to-sf) and [this meta post](https://meta.stackoverflow.com/questions/283033/are-htaccess-questions-ever-on-topic-at-so) this question is not at all off-topic for SO so close vote is wrong. This has been discussed again and again and it has been settled already that most of the rewrite rules are created and maintained by developers of the web applications/frameworks. – anubhava Dec 06 '21 at 13:28

1 Answers1

1

Interesting problem indeed! force me to dig many Apache docs. In the end solution was simple i.e. to escape ? so that Apache doesn't treat ? and part after that as query string.

You may use this rewrite rule in your site root .htaccess:

RewriteEngine On

RewriteCond %{REQUEST_URI} ^/forum/viewforum\.php$ [NC]
RewriteCond %{QUERY_STRING} .
RewriteRule ^ %{REQUEST_URI}\%3F%{QUERY_STRING} [L,NC]

PS: \%3F is escaped ? so make Apache load /forum/viewforum.php?f=2&start=25 as a file.

anubhava
  • 761,203
  • 64
  • 569
  • 643
  • 1
    Thanks a lot! Interesting indeed! Out of curiosity, isn't there a simple way to tell Apache to treat *every request* as a filename request? (And don't parse the URL / query string). So this would work for `viewforum.php`, `viewtopic.php`, `styleheet.css?foo=bar`, etc. without hardcoding all these cases in the .htaccess. – Basj Dec 06 '21 at 13:21
  • If you remove `RewriteCond %{REQUEST_URI}` line then it will do this handling for every URL coming to your Apache. – anubhava Dec 06 '21 at 13:23
  • I tried this @anubhava, but then it lead to an infinite redirection loop (IIRC). – Basj Dec 06 '21 at 13:42
  • For which original URL did you get IIRC ? – anubhava Dec 06 '21 at 14:00
  • Also @anubhava, the output is not a traditional output of a webserver serving HTML pages. I don't have `Content-Type: text/html; charset=UTF-8`, in fact there is no `Content-Type` at all, and I noticed this prevent web crawlers to work correctly (I tested with Ahrefs Site audit tool). Is there a way to force a ContentType? – Basj Dec 06 '21 at 14:48
  • Can you please show output of `curl -I 'http:///forum/viewforum.php?f=2&start=25` in your question so that I can see what's missing. – anubhava Dec 06 '21 at 14:52
  • 1
    Solved, this was because of `RemoveHandler .php .phtml .php3 RemoveType .php .phtml .php3 php_flag engine off` that I now removed. Fixed! – Basj Dec 06 '21 at 14:53
  • 1
    I finally renamed everything like `viewtopic.php?foo=bar` to `viewtopic.php?foo=bar.html` and I modified the htaccess accordingly. Without it, the Content-Type was never set correctly reliably to `text/html`, and it was bas for crawlers. – Basj Dec 06 '21 at 15:39
  • Yes that's a good idea to always have `.html` extension – anubhava Dec 06 '21 at 15:44
  • Very last thing @anubhava: do you see how to discard certain query string parameters with `RewriteRule`. Example: I'd like to keep only `t=` and `start=` for `viewtopic.php` query string parameters. For example `viewtopic.php?f=2&t=323&p=942` should rewrite to `viewtopic.php?t=323.html` and discard the `f=` and `p=`. Thanks! – Basj Dec 06 '21 at 16:08
  • 1
    Hmm removing specific query parameters is a bit complex. [Please check this answer](https://stackoverflow.com/a/44650416/548225) – anubhava Dec 06 '21 at 17:33