15

I can create a multi-line string using this syntax:

string = str("Some chars "
         "Some more chars")

This will produce the following string:

"Some chars Some more chars"

Is Python joining these two separate strings or is the editor/compiler treating them as a single string?

P.s: I just want to understand the internals. I know there are other ways to declare or create multi-line strings.

Dimitris Fasarakis Hilliard
  • 150,925
  • 31
  • 268
  • 253
Ganesh Satpute
  • 3,664
  • 6
  • 41
  • 78
  • 3
    "Does python is joining these two separate strings" - Yes, it does. [This behaviour is even documented.](https://docs.python.org/2/reference/lexical_analysis.html#string-literal-concatenation) – vaultah Dec 09 '15 at 09:01
  • 1
    It is not a **multiline** string, though. This feature has nothing to do with line breaks. Python is ignoring the line break because of the parenthesis. BTW, a multiline string could be created using triple quotes (`"""` or `'''`). – zvone Dec 09 '15 at 09:09

1 Answers1

34

Read the reference manual, it's in there. Specifically:

Multiple adjacent string or bytes literals (delimited by whitespace), possibly using different quoting conventions, are allowed, and their meaning is the same as their concatenation. Thus, "hello" 'world' is equivalent to "helloworld". This feature can be used to reduce the number of backslashes needed, to split long strings conveniently across long lines, or even to add comments to parts of strings,

(emphasis mine)

This is why:

string = str("Some chars "
         "Some more chars")

is exactly the same as: str("Some chars Some more chars").

This action is performed wherever a string literal might appear, list initiliazations, function calls (as is the case with str above) et-cetera.

The only caveat is when a string literal is not contained between one of the grouping delimiters (), {} or [] but, instead, spreads between two separate physical lines. In that case we can alternatively use the backslash character to join these lines and get the same result:

string = "Some chars " \
         "Some more chars"

Of course, concatenation of strings on the same physical line does not require the backslash. (string = "Hello " "World" is just fine)


Is Python joining these two separate strings or is the editor/compiler treating them as a single string?

Python is, now when exactly does Python do this is where things get interesting.

From what I could gather (take this with a pinch of salt, I'm not a parsing expert), this happens when Python transforms the parse tree (LL(1) Parser) for a given expression to it's corresponding AST (Abstract Syntax Tree).

You can get a view of the parsed tree via the parser module:

import parser

expr = """
       str("Hello "
           "World")
"""
pexpr = parser.expr(expr)
parser.st2list(pexpr)

This dumps a pretty big and confusing list that represents concrete syntax tree parsed from the expression in expr:

-- rest snipped for brevity --

          [322,
             [323,
                [3, '"hello"'],
                [3, '"world"']]]]]]]]]]]]]]]]]],

-- rest snipped for brevity --

The numbers correspond to either symbols or tokens in the parse tree and the mappings from symbol to grammar rule and token to constant are in Lib/symbol.py and Lib/token.py respectively.

As you can see in the snipped version I added, you have two different entries corresponding to the two different str literals in the expression parsed.

Next, we can view the output of the AST tree produced by the previous expression via the ast module provided in the Standard Library:

p = ast.parse(expr)
ast.dump(p)

# this prints out the following:
"Module(body = [Expr(value = Call(func = Name(id = 'str', ctx = Load()), args = [Str(s = 'hello world')], keywords = []))])"

The output is more user friendly in this case; you can see that the args for the function call is the single concatenated string Hello World.

In addition, I also stumbled upon a cool module that generates a visualization of the tree for ast nodes. Using it, the output of the expression expr is visualized like this:

                                           expression tree for the given expression

Image cropped to show only the relevant part for the expression.

As you can see, in the terminal leaf node we have a single str object, the joined string for "Hello " and "World", i.e "Hello World".


If you are feeling brave enough, dig into the source, the source code for transforming expressions into a parse tree is located at Parser/pgen.c while the code transforming the parse tree into an Abstract Syntax Tree is in Python/ast.c.

This information is for Python 3.5 and I'm pretty sure that unless you're using some really old version (< 2.5) the functionality and locations should be similar.

Additionally, if you are interested in the whole compilation step python follows, a good gentle intro is provided by one of the core contributors, Brett Cannon, in the video From Source to Code: How CPython's Compiler Works.

Dimitris Fasarakis Hilliard
  • 150,925
  • 31
  • 268
  • 253
  • Thanks for your answer. Which clears my doubt. Additionally I would like to ask when I say "Some chars " and on the next line I say "Some more chars ", without enclosing in the brackets, it will not produce the same output. So, when they are saying whitespaces they don't mean newlines, do they? – Ganesh Satpute Dec 09 '15 at 09:10
  • @falsetru without brackets it won't work. That's just placeholder to satisfy python :) – Ganesh Satpute Dec 09 '15 at 09:15
  • @GaneshSatpute, I meant omitting only `str`, not parentheses: `("Some char " "Some more chars")` – falsetru Dec 09 '15 at 09:17
  • @GaneshSatpute, you can use line continuation, just put a `"\"` at the end of the line. Then you don't need parens – John La Rooy Dec 09 '15 at 09:22
  • @GaneshSatpute I updated my answer to include further information about when exactly this action is performed. When it comes to **how** it is performed I cannot assist, shifting through and understanding the `c` source for `Python` is something that I don't fully possess the skills for yet (*soon!*). – Dimitris Fasarakis Hilliard Dec 09 '15 at 09:57
  • 1
    @Jim thanks for your elaborated answer. It took me a while to read and grasp. :) I'll probably go and check the source code as well. Thanks again for your inputs – Ganesh Satpute Dec 09 '15 at 10:54
  • Something to be aware of, this can be a gotcha when working with a list of string literals. `['foo' 'bar']` will concatenate the strings, where any other two list items without a comma in between would throw an error. Missing commas are especially tricky to spot when when the list is split across many lines (e.g. one line per list entry for a long list). – Pathogen Jun 21 '21 at 23:29