6

I am developing a website using Django 1.4 and I use django-registration for the signup process. It turns out that Unicode characters are not allowed as usernames, whenever a user enters e.g. a Chinese character as part of username the registration fails with:

This value may contain only letters, numbers and @/./+/-/_ characters.

Is it possible to change it so Unicode characters are allowed in usernames? If yes, how can I do it? Also, can it cause any problem?

dda
  • 6,030
  • 2
  • 25
  • 34
piokuc
  • 25,594
  • 11
  • 72
  • 102
  • While retrieving the username and other fields , you could try encoding them in utf-8 and then storing it. try -> encode('utf-8') . – Avichal Badaya Aug 29 '12 at 20:58
  • Thanks for the suggestion. A user from Singapore is reporting me that he can see the above error message when he is trying to signup using a username consisting of latin characters only. He has his system and keyboard configured for typing Chinese characters, so I suspect whatever he types in into the form is encoded as Unicode. Do you think converting to utf-8 would fix this issue? – piokuc Aug 31 '12 at 15:06
  • yes, it should do the trick. check out our site www.vialogues.com , try to create account using any foreign language . We are using the same thing, converting to utf-8. – Avichal Badaya Aug 31 '12 at 15:20
  • Thanks, do you use django-registration as well? May I ask you where you do the conversion to utf-8? – piokuc Aug 31 '12 at 15:25
  • just created an account with some Polish diacritic characters – piokuc Aug 31 '12 at 15:35
  • great ! well, we are converting the fields to utf-8 after retrieving it from the form and before storing it to the database. We are basically using CAS Api for user creation and authentication. – Avichal Badaya Aug 31 '12 at 15:54

2 Answers2

5

It is really not a problem - because this character restriction is in UserCreationForm (or RegistrationForm in django-registration) only as I remember, and you can easily make your own since field in database is just normal TextField.

But those restriction is there not without a reason. One of the possible problems I can think of now is creating links - usernames are often used for that and it may cause a problem. There is also bigger possibility of fake accounts with usernames looking the same but being in fact different characters, etc.

jasisz
  • 1,288
  • 8
  • 9
  • 1
    Your answer is entirely inaccurate. Well, not *entirely*, the bit about the restriction only being imposed by the form is correct. However, Django and every modern browser is fully unicode-capable. Unicode is perfectly acceptable in URLs. Also, each unicode character is unique byte-code, so `blah` and `bláh` are *not* the same and could both be used as usernames, simultaneously. – Chris Pratt Aug 29 '12 at 20:32
  • 5
    Am I telling that browsers are incapable anywhere? Well - I was entirely not talking about byte-code but how characters looks and how confusing it could be for users - f.e. tell me difference between "Аnnа" and "Anna" which are completely different strings in unicode. It is very possible to use it for some kind of scam. – jasisz Aug 29 '12 at 20:42
  • @jasisz Could you please give me more details on what I should do to remove the restriction? Also, what's really the difference between "Аnnа" and "Anna" in your example? – piokuc Aug 31 '12 at 15:12
  • If you are using plain Django UserCreationForm just extend it and override username field (it is originally RegexField, you can make it just TextField if that's what your really need).In my example both "a" in one of Anna's are cyrillic letters that looks the same as "normal", but have completely different bytecodes. – jasisz Sep 01 '12 at 22:06
  • Thanks for that. Extending `UserCreationForm` seems easy enough, but how do I make `django-registration` work with the extended class instead of the original `UserCreationForm`? This is something I don't understand. – piokuc Sep 05 '12 at 16:03
  • @jasisz Thanks! I'm closing the question. I have just posted a followup question here: http://stackoverflow.com/q/12357168/300886 – piokuc Sep 10 '12 at 18:09
4

Django 1.10 now officially supports unicode usernames, see:

The User model in django.contrib.auth originally only accepted ASCII letters in usernames. Although it wasn’t a deliberate choice, Unicode characters have always been accepted when using Python 3.

The username validator now explicitly accepts Unicode letters by default on Python 3 only. This default behavior can be overridden by changing the username_validator attribute of the User model, or to any proxy of that model, using either ASCIIUsernameValidator or UnicodeUsernameValidator. Custom user models may also use those validators.

Cesar Canassa
  • 18,659
  • 11
  • 66
  • 69