Must the encoding definition be in the 1st/2nd line in Python?

Question

To define a source code encoding, a magic comment must be placed into the source files either as first or second line in the file, such as:

# coding=<encoding name>

or (using formats recognized by popular editors):

#!/usr/bin/python
# -*- coding: <encoding name> -*-

What if there are cases where the licensing information comes at the top-most lines, e.g. from https://github.com/google/seq2seq/blob/master/seq2seq/training/utils.py:

# Copyright 2017 Google Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#      http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# -*- coding: utf-8 -*-
"""Miscellaneous training utility functions.
"""

Would the encoding definition be still "magically" accepted by the Python interpreter? It'll be great if the answer explains why must it be in the 1st two lines and pointer to the interpreter code would be awesome!

The wording you quoted is quite clear that the encoding must be in the first or second lines. — BrenBarn, Mar 14 '17 at 04:39
Have you tried it? What would you do to find out whether this encoding line was doing what it's intended to do? — Jon Kiparsky, Mar 14 '17 at 04:41
How could we find out whether the encoding line is doing what it's intended to do? Add some 'utf8' chars in the code? — alvas, Mar 14 '17 at 04:43
First and second lines do nothing for you if haven't idea "what are you doing now". Encoding got more variants(os, system,python modules, etc.), **to see the magic, need to be selected** restricted all magical powers on programming ! — dsgdfg, Mar 14 '17 at 07:22

Jonathan Eunice · Accepted Answer · 2017-03-14T04:53:32.243

Yes, in Python 2, where that coding mark is required for UTF-8 encodings, if it's beyond the second line, and there are any non-ASCII characters in the file, you will raise an error like this:

File "encoded.py", line 5
SyntaxError: Non-ASCII character '\xe1' in file encoded.py on line 5, but 
no encoding declared; see http://python.org/dev/peps/pep-0263/ for details

If the file contains only ASCII characters, it will still work, even if the UTF-8 coding mark is later than line 2. ASCII is a subset of UTF-8, and basically, the late coding directive is being ignored. (This seems to be the case for the specific utils.py you referenced.)

Many parsers and other file processors require such magic commands to be at the start of file because they have to be scanned for and taken into account in order to properly interpret the files. Put them later on, and it'd be inefficient, requiring scanning the entire file to find a few "magic" special cases.

You will get some leeway in Python 3, which assumes a UTF-8 encoding. Though if your file is encoded some other way, you still would want to include it.

score 1 · Answer 2 · answered Mar 14 '17 at 04:56

1

The spec allows for the first two lines to allow for a shebang #!... on unix systems.

No, it is not allowed after the second line.

Here's the bit of code from cpython's tokenizer which checks for (and parses) the coding cookie: https://github.com/python/cpython/blob/9e52c907b5511393ab7e44321e9521fe0967e34d/Parser/tokenizer.c#L613-L616

answered Mar 14 '17 at 04:56

anthony sottile

61,815
15
148
207

Great link to the source =) – alvas Mar 16 '17 at 16:30

Must the encoding definition be in the 1st/2nd line in Python?

2 Answers2