I host a webpage that has 'project²
' in the URL, matching an on-disk directory project²
from where static files are hosted.
This page is used by a java-based client to load data from URLs (bioinformatics software IGV).
My page lists URLs in the form of http://localhost:60151/load?file=http://example.org/project²/some/data/file.bam
.
Clicking these links in the browser will cause the IGV client (running on localhost) to request GET http://example.org/project²/some/data/file.bam
from my server.
✅ IGV on Linux/Mac responds by requesting this URL as UTF-8 encoded ²
=%C2%B2
, and everything is happily working.
❌ My newly gained Win-10 user's client requests ²
=%B2
(windows-1252 encoded), resulting in a 404-not-found.
After trying dozens of things, I am at wits end how to help this user.
I have the impression that I should be able to dynamically rewrite the wrongly-encoded URLs on the server-side, so that they still end up serving the desired data, but I do not know the magic character combinations to make the rule-patterns match escaped characters.
Things I've already tried
- Doublechecking that the 404s are not network issues; I see the
GET %B2
in myssl_access_log
with404
as the returned statuscode, so it really is the server doing it. - 'Proper' way: UrlEncoding the URL before giving it into the client. Perl's
URI::Encode
encode_uri
turns the²
into%C3%82%C2%B2
(apparentlyò
?) which is even more wrong somehow? - triple-checked that the webpage providing the load-URLs is served as utf-8
- it provides header
Content-Type: text/html; charset=UTF-8
- Set
AddDefaultCharset UTF-8
inhttpd.conf
- It seems the encoding info is not transferred through from webbrowser API-link-click into the Java program
- it provides header
- 'doubled' the directory by symlinking
andprojectª -> project²
project%B2 -> project²
(edit: ª is in no way related; no idea where I got that fromª
is the UTF8-match for%B2
) - Tried to
mod_rewrite
'bad' URLs into good ones in several different ways, none of which seem to catch:
RewriteEngine on
# RewriteRule Pattern Substitution [flags]
RewriteRule (.*)project%B2/(.*) $1project²/$2 [NE] # encoded 'bad' request, unencoded redirect
RewriteRule (.*)²(.*) $1%C2%B2$2 [B,NE] # config file is utf-8 encoded, so this is senseless.
RewriteRule (.*)%B2(.*) $12$2 [B,NE] # doesn't match?
RewriteRule (.*)TZZT(.*) $1test$2 # works, so RewriteEngine is working
The RewriteRule and RewriteRuleFlags docs also do not help me understand how I should encode the Pattern
-part so that it'll work :-(
Similar questions on here
- Can Apache .htaccess convert the percent-encoding in encoded URIs from Win-1252 to UTF-8? -> an external encoding program
rewritemap
seems overkill, since it is literally only one folderproject²
, so my scope is smaller. - Rewriting ASCII-percent-encoded locations to their UTF-8 encoded equivalent same problem in NGinX, point to the above Apache-question.