If you want to include only Arabic characters you can include unicode ranges in premitted_uri_chars regexp. Using this wikipedia site we can try to construct regexp:
a-z 0-9~%.:_\- \x0600-\x06FF
Unfortunately for our case CodeIgniter doesn't use the u
modifier (used for unicode) in preg_match
. So in order for this to work you would need to modify source code file system/core/URI.php, line 257 line, and change it to:
if ( ! preg_match("|^[".str_replace(array('\\-', '\-'), '-', preg_quote($this->config->item('permitted_uri_chars'), '-'))."]+$|iu", $str))
In the above code I've added only the u
modifier to preg_match
. Alternatively you can extend URI
class as desribed in documentation, which is a better choice.
(I didn't test this)
To answer the question why it is bad to allow all characters: I can only think of SQL Injection problems or other kinds of injections.
Edit: for example if you use url index.php/main/get_pdf?filename=awesome.pdf
to download pdf files from ./pdf/awesome.pdf
if you don't treat (i.e. validate) your input correctly malicious user could do something like this: index.php/main/get_pdf?filename=../secure_files/nuclear_launch_codes.pdf
;).
Edit2: Well, above example is a not an example of bad use of permitted_uri_chars
because AFAIK CodeIgniter allows this kind of url variables, so you need to validate this stuff your self. I'll check all of this stuff when I get home.
Edit3: I fixed regexp, but it seems that this is not the way to enable Arabic characters so I crossed out this part of the answer.
I played with CodeIgniter a little. I don't know if this stuff will work on other system. It works on my Windows XP, PHP 5.3. This is what I found:
- In PHP you can use UTF-8 characters as function and class identifiers, but it is not officialy supported (see this for further info).
In CodeIgniter the controller/method
part of the URL is url encoded (e.g. ـج
is converted to %D9%80%D8%AC%E2%80%8E
). If you want to use Arabic in controller
or method
names you have two options:
- In
application/config/routes.php
add url encoded route pointing to real route which could contain Arabic characters (as mentioned above, you can use UTF-8 characters in PHP identifiers). E.g.: $route['welcome/%D8%A3'] = 'welcome/أ';
will enable user to go to example.com/index.php/welcome/أ
which will call أ
method (defined as function أ() { ... }
) in welcome
controller. Of course you can map arabic url encoded urls to normal ASCII names.
Extend system/core/Router.php
class so that fetch_method
and fetch_class
return url decoded names. I don't know what security implications are when you do this. Probably it is better to validate if input characters are indeed Arabic (i.e. you can check char ranges supplied here). Example of modified fetch_class
:
function fetch_class()
{
return urldecode($this->class);
}
If you need to use Arabic characters in parameters of controller methods you just need to urldecode
these parameters. E.g.:
class Welcome extends CI_Controller
{
public function index($param)
{
$this->output->set_content_type("text/plain; charset=utf-8");
echo urldecode($param);
}
}
If you need to use these characters in query string it just works. E.g.
class Welcome extends CI_Controller
{
public function index()
{
$this->output->set_content_type("text/plain; charset=utf-8");
echo $this->input->get('arabic');
}
}
Going to example.com/index.php/welcome/index?arabic=ابتثجحخدذرزسشصضطظعغفقكلمنهوي
will print out ابتثجحخدذرزسشصضطظعغفقكلمنهوي
.
Edit4: If you have $config['uri_protocol'] = 'PATH_INFO'
then:
- In config set
$config['permitted_uri_chars'] = 'a-z 0-9~%.:_\-\x{0600}-\x{06FF}';
Extend URI
class in system/core/URI.php
so that in method _filter_uri
line with preg_match
is:
if ( ! preg_match("|^[".str_replace(array('\\-', '\-', '\{', '\}', '\\\\x'), array('-', '-', '{', '}', '\x'), preg_quote($this->config->item('permitted_uri_chars')))."]+$|ui", $str))