73

Is there an RFC, official standard, or template for creating a User Agent string? The iphone's user-agent string seems strange...

Mozilla/5.0 (iPhone; U; CPU iPhone OS 3_1_2 like Mac OS X; en-us) AppleWebKit/528.18 (KHTML, like Gecko) Version/4.0 Mobile/7D11 Safari/528.16

John Himmelman
  • 21,504
  • 22
  • 65
  • 80
  • 1
    The iPhone seriously puts `Mozilla/5.0` at the beginning of it's user agent? – Tarka Apr 08 '10 at 15:59
  • 9
    @Slokun why the surprise? IE user-agent starts with `Mozilla/4.0`. Remember that Mozilla one of the first browsers to be made, and all others include, to various degrees, parts of its foundation. – Paulo Santos Apr 08 '10 at 16:09
  • 1
    The explanation on http://www.useragentstring.com/ is that it should just mean Gecko-based browsers (Netscape and Firefox) but most other browsers include it to say they're Mozilla-compatible. – Rikki Nov 10 '12 at 11:19
  • 2
    Think of `Mozilla/` as "not Lynx". Generally text-only = not Mozilla-compatible. Some old WML/HDML feature-phone browsers also don't identify as Mozilla. (Fun fact: all the browsers before Lynx died of dysentery or were eaten by grues.) – Webveloper May 09 '13 at 03:09
  • 14
    [everyone pretended to be everyone else, and confusion abounded](http://webaim.org/blog/user-agent-string-history/) – TRiG Nov 30 '14 at 04:17
  • See https://github.com/WICG/ua-client-hints for interesting proposal from Mike West to clean the mess. – Piotr Dobrogost Mar 25 '19 at 13:17

3 Answers3

90

Note: In June 2022 the IETF (Internet Engineering Task Force) published RFC9110, which obsoleted RFC7231, therefore I'm updating this answer with the new RFC information.

The User-Agent header is part of the RFC9110, which describes the HTTP Semantics, where is states:

The "User-Agent" header field contains information about the user agent originating the request, which is often used by servers to help identify the scope of reported interoperability problems, to work around or tailor responses to avoid particular user agent limitations, and for analytics regarding browser or operating system use. A user agent SHOULD send a User-Agent header field in each request unless specifically configured not to do so.

EBNF Specification

User-Agent = product *( RWS ( product | comment ) )

Where product is defined as:

product         = token ["/" product-version]
product-version = token
token           = 1*tchar
tchar           = "!" / "#" / "$" / "%" / "&" / "'" / "*"
                / "+" / "-" / "." / "^" / "_" / "`" / "|" / "~"
                / DIGIT / ALPHA
                ; any VCHAR, except delimiters

And comment is defined as:

comment     = "(" *( ctext | quoted-pair | comment ) ")"
ctext       = HTAB | SP | %x21-27 | %x2A-5B | %x5D-7E | obs-text
quoted-pair = "\" ( HTAB | SP | VCHAR | obs-text )
obs-text    = %x80-ff

Other rules for reference:

HTAB  = <ASCII horizontal tab %x09, aka '\t'>
SP    = <ASCII space, i.e. " ">
VCHAR = <any visible US-ASCII character>
DIGIT = <digits from 0 to 9>
ALPHA = <letters>
RWS   = 1*( SP | HTAB )
1*    = <One or more>

Note that this means that product cannot contain spaces, but comments can.


Examples:

Here are some valid examples of product strings (with and without product-version strings):

# Single `product` without product-version:
Foobar
Foobar-baz

# Single `product` with product-version:
Foobar/abc
Foobar/1.0.0
Foobar/2021.44.30.15-b917dc

Here are some valid examples of comment strings; note how all strings are enclosed in matched parentheses ( ):

# This was the default `comment` used by Internet Explorer 11:
(Windows NT 6.1; WOW64; Trident/7.0; rv:11.0)

# You can put almost any text inside a comment:
(Why are you looking at HTTP headers? Go outside, find love, do some good in the world)

# Note that `comment` strings can also be nested, provided their delimiting parentheses are matched, for example:
(Outer comment (Inner comment))

As a User-Agent header's value is comprised of arbitrary product and comment strings, these are all valid User-Agent headers:

User-Agent: Foobar
User-Agent: Foobar/2021.44.30.15-b917dc
User-Agent: MyProduct Foobar/2021.44.30.15-b917dc
User-Agent: Tsom/OfraHaza (Life is short and love is always over in the morning) AnotherProduct
Paulo Santos
  • 11,285
  • 4
  • 39
  • 65
  • 3
    Thanks, this is exactly what I was looking for. There doesn't appear to be a standard format for the comment field. – John Himmelman Apr 08 '10 at 17:36
  • What is "quoted-pair"? – QED Sep 15 '13 at 00:40
  • quoted-pair = "\" CHAR – Aleš Kotnik Oct 30 '13 at 23:14
  • 32
    Some examples of this, for readers unfamiliar with EBNF, would be ideal. (= – ELLIOTTCABLE May 20 '14 at 11:38
  • 6
    The referenced RFC is now obsolete. http://tools.ietf.org/html/rfc7231 obviates it. – A.R. Sep 08 '15 at 19:14
  • 2
    Funnily enough, RFC 7231 specifically calls out "us[ing] the product tokens of other implementations in order to declare compatibility with them" as a Bad Idea. – Kevin Oct 28 '15 at 13:31
  • 1
    Can the User-Agent string be in any character set? Can it contain for example Russian or Chinese characters? – Liam Dec 23 '19 at 17:01
  • @Liam In the absence of any explicit language in the RFC, I would assume it best to stick to 7-bit ASCII. It's hard to imagine a situation where deviating from this would be required, or indeed even useful; the User-Agent: is not intended for human end-user consumption. https://datatracker.ietf.org/doc/html/rfc7231#section-8.3.1 contains some suggestions though I have not spent too much time figuring out if that's authoritatively applicable here. – tripleee Jun 12 '21 at 12:13
  • @tripleee I have seen some Russian characters which seemed to indicate a student edition of Windows, not sure if it was genuine though. Thanks. – Liam Jun 13 '21 at 22:04
  • 1
    In _So Long, and Thanks for All the Fish,_ Douglas Adams wrote "The CIA denied it, which means it must be true!" Similarly, if Microsoft is doing something, you can be pretty sure it's wrong and dumb. – tripleee Jun 14 '21 at 04:47
  • Found a Russian example: `Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 2.0.50727; Студент; .NET CLR 1.1.4322)`. It's also XP and MSIE 8.0, so I'm not going to invest more time in this! – Liam Jun 14 '21 at 15:32
  • @ELLIOTTCABLE I added some examples now – Dai Feb 07 '22 at 07:58
  • 1
    What does `1*` mean? – darw Aug 30 '22 at 09:57
  • 1
    @darw: `1*` means "one or more". `*` is the repetition operator in ABNF notation. (The ABNF notation used by RFC 1945 is described in section 2.1 of the RFC.) – nishanthshanmugham Jun 08 '23 at 18:51
11

This is specified in RFC 1945 in the section on Request Headers. It is not a very standardized format, though, and user agents tend to put whatever they want in there.

Community
  • 1
  • 1
tloflin
  • 4,050
  • 1
  • 25
  • 33
3

Yes, see: mozilla website, but as it was mentioned before. Basically you can put whatever you want there. For statistical/analytical purposes, the most important thing is, that every browser/os should have this standardized for itself.

wlk
  • 5,695
  • 6
  • 54
  • 72