What is the difference between 'content' and 'text'

Question

I am using the terrific Python Requests library. I notice that the fine documentation has many examples of how to do something without explaining the why. For instance, both r.text and r.content are shown as examples of how to get the server response. But where is it explained what these properties do? For instance, when would I choose one over the other? I see thar r.text returns a unicode object sometimes, and I suppose that there would be a difference for a non-text response. But where is all this documented? Note that the linked document does state:

You can also access the response body as bytes, for non-text requests:

But then it goes on to show an example of a text response! I can only suppose that the quote above means to say non-text responses instead of non-text requests, as a non-text request does not make sense in HTTP.

In short, where is the proper documentation of the library, as opposed to the (excellent) tutorial on the Python Requests site?

Related: [Should I use .text or .content when parsing a Requests response?](https://stackoverflow.com/q/40163323/3357935) — Stevoisiak, Oct 31 '17 at 18:42
"In short, where is the proper documentation of the library, as opposed to the (excellent) tutorial on the Python Requests site?" The link on the sidebar that says "API Reference", perhaps? — Karl Knechtel, Nov 28 '21 at 22:54
@KarlKnechtel: Thank you. It is quite possible that the Python Requests site was organized differently when the question was asked over eight years ago! — dotancohen, Nov 29 '21 at 06:06

score 218 · Accepted Answer · edited Nov 12 '20 at 15:13

218

The requests.Response class documentation has more details:

r.text is the content of the response in Unicode, and r.content is the content of the response in bytes.

edited Nov 12 '20 at 15:13

Shiplu Mokaddim

56,364
17
141
187

answered Jun 09 '13 at 15:57

Gary Kerr

13,650
4
48
51

77

And when would you choose one or the other? – multigoodverse Dec 20 '15 at 10:01
37

@multigoodverse: Presumably `r.text` would be preferred for textual responses, such as an HTML or XML document, and `r.content` would be preferred for "binary" filetypes, such as an image or PDF file. – dotancohen Feb 07 '18 at 12:19
7

@dotancohen HTML and XML use declarations in the data to do their own decoding and so they should be fed the raw `r.content`, not the coverted `r.text`. – tdelaney Mar 06 '18 at 19:47
1

More generally, a single response might contain nested or multipart content (like email messages with attachments), and each part might be encoded in different ways. It's impossible to handle such responses without access to the byte stream, but it's a long way from the common case, where you just want correctly-decoded Unicode text. – holdenweb Nov 08 '18 at 12:12
2

Why the python interpreter shows both `r.text` and `r.content` as texts. Why not show `r.content` as text and `r.text` as bits (if that's what it inherently is)? – Arnb Jun 16 '19 at 10:18

score 14 · Answer 2 · answered Jun 09 '13 at 15:57

14

It seems clear from the documentation is that r.content

You can also access the response body as bytes, for non-text requests:

 >>> r.content

If you read further down the page it addresses for example an image file

answered Jun 09 '13 at 15:57

PyNEwbie

4,882
4
38
86

4

Thank you. I now see the small `b` preceding the first example with the text "for non-text requests", which means that the object is a bytes object. It is not clear why the bytes is being displayed as text, perhaps that is another Python 'nicety', but it is confusing in this context. Thanks. – dotancohen Jun 09 '13 at 16:05
2

this seems to matter more with python 3.x than python 2.x; using `requests` in python 3 on page.content gives this error: `if 'rss' in page.content:` --> `TypeError: a bytes-like object is required, not 'str'` – Marc Maxmeister Aug 19 '18 at 02:53

What is the difference between 'content' and 'text'

2 Answers2

Linked