31

Many web based user authentication systems don't allow usernames that contain characters other than letters, numbers and underscores.

Could there be a technical reason for that?

Emanuil Rusev
  • 34,563
  • 55
  • 137
  • 201

10 Answers10

19

A well-designed system doesn't necessarily need to prevent any special characters in usernames.

That said, the reason underscores have traditionally been accepted, is that underscore is typically treated as a "word" character, along with letters and numbers. It is usually the only other character given this distinction. This is true in regular expressions, and even at a base level in most operating systems (type an underscore in a word and double click the letters. The selection will extend past the underscore. Now try the same with a dash, it most likely will not.)

Nicole
  • 32,841
  • 11
  • 75
  • 101
  • 9
    Well characters which are not easy to read or ones that make the system messy are good candidates to disallow. A username like !@#$% would be bad. – Veger Jan 12 '10 at 23:50
  • 6
    Although Veger's comment is valid from a usability standpoint, it misses Renesis point, which is a technical one: there are no inherent technical limitations to what comprise a "username". – G-Wiz Jan 12 '10 at 23:53
  • 7
    The "well-designed system" may some day meet real-world integration and security cases. Domain names, emails, file names and other systems that have their restrictions. There are a lot of security issues with [Unicode usernames](http://unicode.org/reports/tr36/), as demonstrated by [Spotify](https://labs.spotify.com/2013/06/18/creative-usernames/) that might have thought the same way. To solve all these one may just follow [POSIX.1-2008](http://serverfault.com/a/578264/226737) standard for a portable username. – saaj Aug 31 '16 at 15:26
17

Yes: to avoid having to escape special characters. Lazy programmers will just drop what the user types, straight into the code somewhere and this is what leads to injection attacks.

Even if it's not used maliciously, allowing the user to type characters that will conflict somewhere else can be more hassle than necessary. For example, if you decide to create a filesystem directory per user, to store their uploads in, then the username must conform to directory naming rules on that OS (e.g. no \/:*?"<>| on Windows).

Once you've avoided clashes like the directory naming one, and stripped out "';% and // to avoid injection attacks, you have removed most punctuation, and "why does someone even need punctuation in their user name"?

It was far easier to write a quick regex to validate usernames against [a-zA-Z0-9_] and be done with it, than faff about with figuring out all the possible punctuation that will not clash, or mapping them to other characters in some way.

Then, like many things in computing, as soon as enough people start having just letters, numbers and underscores for usernames, and people start making usernames to that spec, it became the de facto standard and self perpetuates!

Rikki
  • 1,142
  • 15
  • 17
  • 1
    @CarlosMuñoz: the original question asked what the likely reasons were for such a strict limitation on many web systems. Obviously it is possible to build a system that accepts other characters, but the original question was asking why that isn't always the case. – Rikki Sep 27 '13 at 14:17
  • Aside from those reasons, surely it's also a pain for users themselves with special characters in usernames? Typing names with accents or other special characters can be a challenge for most people! – Ciaran Gallagher Mar 07 '19 at 15:20
9

When not specified I use this:

(updated regex to fix the backtracking @abney317 mentioned)

^\w(?:\w|[.-](?=\w)){3,31}$

(original regex)

^\w(?:\w*(?:[.-]\w+)?)*(?<=^.{4,32})$

This requires a length of 4 with maximum 32 characters. It must start with a word character and can have non continuous dots and dashes. The only reason I use this is because it's strict enough to integrate with almost anything :)

Valid :

test.tost

Invalid :

test..tost

Diadistis
  • 12,086
  • 1
  • 33
  • 55
  • Hmm, so Unicode word characters are OK? I have often wondered whether people have Unicode characters in their username or whether systems outright block it for simplicity. – Hakanai Feb 13 '12 at 03:10
  • 1
    This regex has a [backtracking issue](https://www.regular-expressions.info/catastrophic.html) and will lock up browsers with too much recursion. – abney317 Dec 23 '21 at 02:32
7

Limiting it to these characters (or even the ASCII subset of them) prevents usernames like from being accepted. By not accepting these characters, you can prevent a wide range or usernames-that-look-like-other-usernames.

medgno
  • 188
  • 1
  • 5
5

I don't like the readability argument when it interferes with the ability for people to use their native language in usernames.

I recommend you experiment with using character classes that incorporate http://msdn.microsoft.com/en-us/library/20bw873z.aspx#SupportedUnicodeGeneralCategories or http://msdn.microsoft.com/en-us/library/20bw873z.aspx#SupportedNamedBlocks. I haven't tried this, but

[\p{L}\p{N}\p{M}]

might be worth an experiment.

John Saunders
  • 160,644
  • 26
  • 247
  • 397
3

Because it allows multiple words to be represented in a somewhat readable manner.

Peronally I really, really wish folks would expand things a bit to allow dashes and apostrophes. This would allow people to use non-english phonetic names (eg: Native American tribal names like She-Ki and Ke`Xthsa-Tse)

T.E.D.
  • 44,016
  • 10
  • 73
  • 134
2

The main reason websites enforce such rules is readability (because usernames like ~-|this<>one|-~ are annoying). It might also be because it's less work (underscores get matched by a \w+ regex, while dashes and other special characters don't), but I doubt that's a major reason.

There is no "standard", so if neither of the above reasons bother you, do whatever you'd like. Personally I'd like to see more websites accept dashes and periods, but it's really a personal preference of readability and consistency vs expression.

Sasha Chedygov
  • 127,549
  • 26
  • 102
  • 115
  • Another good reason for character restrictions is to prevent user-forgery. When you have a community with a well-known user ("Example"), someone may come along and post as "Example." to trick those who don't look further into the profile. This has happened quite a bit on Wikipedia - look at the user list: http://en.wikipedia.org/w/index.php?title=Special:ListUsers&username=Jimbo%20Wales) – Nicole Jan 13 '10 at 00:06
  • 2
    @Renesis: Yeah but you could just add a number to the end of the name anyway. Sure, it's more obvious than a period, but it'll work no matter what character restrictions you add to the system. – Sasha Chedygov Jan 13 '10 at 00:09
1

Depends how your usernames are used. There isn't a general rule, without knowing the context.

Noon Silk
  • 54,084
  • 6
  • 88
  • 105
1

Underscore was traditionally allowed in identifiers in most programming languages, and was generally the only "special" character allowed.
But many web login still do not accept ANY special character and are limited to lower/upper case characters and digits...
And other are fine with really special ones ;-)

Francesca
  • 21,452
  • 4
  • 49
  • 90
0

People may want to write their usernames like_this rather than likethis or LikeThis.

John
  • 15,990
  • 10
  • 70
  • 110