Why does a multiline string replacement with Python work with a hard coded string, but not when the string is read from a file?

Question

I am trying to replace the contents of a string with a placeholder for later substitutions. When I execute my replacement against a string literal, the code works as expected, but if I read the same string from a file (literally the same string literal pasted into the file), it doesn't work.

envelope_string = '''[
 {
  "name": "created_at",
  "type": "TIMESTAMP",
  "mode": "NULLABLE",
  "description": "Message creation time"
 },
 {
  "name": "payload",
  "type": "RECORD",
  "mode": "NULLABLE",
  "description": "Message payload",
  "fields": [
   {
    "name": "type_url",
    "type": "STRING",
    "mode": "NULLABLE",
    "description": "A URL/resource name that uniquely identifies the type of the serialized\n protocol buffer message. This string must contain at least\n one \"/\" character. The last segment of the URL's path must represent\n the fully qualified name of the type (as in\n `path/google.protobuf.Duration`). The name should be in a canonical form\n (e.g., leading \".\" is not accepted).\n\n In practice, teams usually precompile into the binary all types that they\n expect it to use in the context of Any. However, for URLs which use the\n scheme `http`, `https`, or no scheme, one can optionally set up a type\n server that maps type URLs to message definitions as follows:\n\n * If no scheme is provided, `https` is assumed.\n * An HTTP GET on the URL must yield a [google.protobuf.Type][]\n   value in binary format, or produce an error.\n * Applications are allowed to cache lookup results based on the\n   URL, or have them precompiled into a binary to avoid any\n   lookup. Therefore, binary compatibility needs to be preserved\n   on changes to types. (Use versioned type names to manage\n   breaking changes.)\n\n Note: this functionality is not currently available in the official\n protobuf release, and it is not used for type URLs beginning with\n type.googleapis.com.\n\n Schemes other than `http`, `https` (or the empty scheme) might be\n used with implementation specific semantics."
   },
   {
    "name": "value",
    "type": "BYTES",
    "mode": "NULLABLE",
    "description": "Must be a valid serialized protocol buffer of the above specified type."
   }
  ]
 }
]'''

payload_string = '''[
   {
    "name": "type_url",
    "type": "STRING",
    "mode": "NULLABLE",
    "description": "A URL/resource name that uniquely identifies the type of the serialized\n protocol buffer message. This string must contain at least\n one \"/\" character. The last segment of the URL's path must represent\n the fully qualified name of the type (as in\n `path/google.protobuf.Duration`). The name should be in a canonical form\n (e.g., leading \".\" is not accepted).\n\n In practice, teams usually precompile into the binary all types that they\n expect it to use in the context of Any. However, for URLs which use the\n scheme `http`, `https`, or no scheme, one can optionally set up a type\n server that maps type URLs to message definitions as follows:\n\n * If no scheme is provided, `https` is assumed.\n * An HTTP GET on the URL must yield a [google.protobuf.Type][]\n   value in binary format, or produce an error.\n * Applications are allowed to cache lookup results based on the\n   URL, or have them precompiled into a binary to avoid any\n   lookup. Therefore, binary compatibility needs to be preserved\n   on changes to types. (Use versioned type names to manage\n   breaking changes.)\n\n Note: this functionality is not currently available in the official\n protobuf release, and it is not used for type URLs beginning with\n type.googleapis.com.\n\n Schemes other than `http`, `https` (or the empty scheme) might be\n used with implementation specific semantics."
   },
   {
    "name": "value",
    "type": "BYTES",
    "mode": "NULLABLE",
    "description": "Must be a valid serialized protocol buffer of the above specified type."
   }
  ]'''

# This works and replaces the string as expected
my_string = envelope_string.replace(payload_string, "{<Payload>}")
print(my_string)

# But when I read exactly the same text from a file, it doesn't work
f = open("C:\\Temp\\envelope.txt", "r", encoding='utf-8')
file_envelope = f.read()
f.close()

my_file_string = file_envelope.replace(payload_string, "{<Payload>}")
print(my_file_string)

You can try this by simply copying the contents of the envelope_string variable into a text file. The encoding for my text file is UTF-8 without signature

Any suggestions are gratefully received.

What are the contents of `file_envelope` after reading from the file, and do they match what you think they should be? — MattDMo, Nov 08 '22 at 17:53
Your "description" values contain the character sequence `\n`. In a Python string literal, that's a single newline character. In a file, that's just two literal characters. — jasonharper, Nov 08 '22 at 17:56
Have you tried testing for `envelope_string == file_envelope`? — CryptoFool, Nov 08 '22 at 18:00
Is this a line endings problem? I see you're on windows - are you aware it uses `\r\n` as a line ending? — Nick ODell, Nov 08 '22 at 18:00
To be clear - when run against the string literal it works. If the string is saved as a file and then read, it doesn't work. If I use the debugger and inspect the value of my_file_string, it exactly matches the value of envelope_string. — Nigel Ainscoe, Nov 08 '22 at 18:00
I am aware of the difference between line endings on different O/S yes. — Nigel Ainscoe, Nov 08 '22 at 18:02
"it exactly matches the value of envelope_string". By what measure are you determining this. If you compare the two in your program and conditionally print a value to the console only if the two strings match, does that value appear in your console? - Unless you've got a wildly unusual case of a corrupted Python environment, if two strings do not produce the same result, all other things being equal, the strings are not the same. — CryptoFool, Nov 08 '22 at 18:03
The code looks fine so the obvious conclusion is that the file doesn't contain that exact string. However, without the file, we're not going to be able to help. — Ouroborus, Nov 08 '22 at 18:06
It couldnt hurt to write a minimum reproductible test w a shorter set of strings. These overlong examples dont really motivate to answer what would otherwise be an intriguing question. Additionally one often finds root causes when “boiling it down”. — JL Peyret, Nov 08 '22 at 18:08
I tried it on my MacBook and got the same result. I replaced the string with a shorter set of strings and it works fine on Windows and Mac. So why, if it works with the shorter strings from string literal and file, does it only work with the string literal with the longer strings? — Nigel Ainscoe, Nov 08 '22 at 19:21
OK, more testing is done and it's the ```\n``` characters that are making it fail in the real-world version. That's a bummer. Thanks to everyone for their input. — Nigel Ainscoe, Nov 08 '22 at 19:32
Fixed it - put the search text into a file and read that in as the value for ```payload_string``` — Nigel Ainscoe, Nov 08 '22 at 20:14

score 2 · Answer 1 · answered Nov 08 '22 at 20:20

It seems that the newline characters in the description were preventing apples vs apples comparison between the file data and the string literal. The solution was to place the search string to be replaced into a file as well and then to read the two files into memory. That done the search in the string.replace() works and I successfully replace the long fields element with my {<Payload>} placeholder.

Why does a multiline string replacement with Python work with a hard coded string, but not when the string is read from a file?

1 Answers1