Why no character/codepoint string equivalant of `collections.abc.ByteString`?

Asked Jul 17 '18 at 03:26

Active Jul 17 '18 at 03:41

Viewed 37 times

Python's collections.abc module offers Sequence and MutableSequence abstract base classes (ABCs) and these cover¹ the str, bytes, bytearray and similar types as expected.

collections.abc also offers a ByteString ABC, which covers bytes, bytearray and presumably similar types. But it offers no ABC for strings of characters or codepoints such as str. (Such an ABC might be named String, CharString or CodepointString.) Why does it offer the former but not the latter? (Put another way, what are the expected use cases that require the former but not the latter?)

¹ 'Cover' as in, the subtypes are isinstance() the ABC.

asked Jul 17 '18 at 03:26

cjs

25,752
9
89
101

3

Because the only one is `str`? – wim Jul 17 '18 at 03:30
1

`ByteString` was added to give you a way to test for the "bytes-like type" that appears all over the 3.x docs without having to write `(bytes, bytearray)`. In fact, the docstring for it is just "This unifies bytes and bytearray." There is no similar need to for Unicode strings, because `str` is the only such type; there's nothing to unify it with. – abarnert Jul 17 '18 at 03:32
2

I'm sure there was some discussion of this on python-dev, python-ideas, or b.p.o. If you really want to read it, and can't figure out how to search it yourself, someone could dig it up for you and write an answer. But it doesn't seem likely to be very interesting to you, or to anyone else. – abarnert Jul 17 '18 at 03:32
1

You can easily jump to [the commit that added `ByteString`](https://github.com/python/cpython/commit/d05eb0043e597cf2d5c429d0e554fd39364e36b0) to see the checkin comment, though. – abarnert Jul 17 '18 at 03:33

1 Answers1

ByteString was added to give you a way to test for the "bytes-like type" that appears all over the 3.x docs without having to write (bytes, bytearray).

In fact, the docstring for it is just "This unifies bytes and bytearray."

There is no similar need to for Unicode strings, because str is the only such type; there's nothing to unify it with.

You can click on the source link at the top of the docs, find ByteString, and git blame it right from the GitHub GUI to find the commit that added it. The checkin comment is:

Add ABC ByteString which unifies bytes and bytearray (but not memoryview).

There's no ABC for "PEP 3118 style buffer API objects" because there's no way to recognize these in Python (apart from trying to use memoryview() on them).

Note that array.array really should be registered as a MutableSequence but that would require importing it whenever collections is imported.

There might be further discussion on b.p.o. or the python-dev or maybe python-ideas mailing list archives near 21 Nov 2007, if you really want to dig deeper. But I doubt there's much more of interest there, because there's really not much to discuss here.

Note that typing actually does have a type for this, Text, which is documented as:

Text is an alias for str. It is provided to supply a forward compatible path for Python 2 code: in Python 2, Text is an alias for unicode.

Use Text to indicate that a value must contain a unicode string in a manner that is compatible with both Python 2 and Python 3:

As the docs make clear, this wasn't added to unify multiple Unicode string types within the same language, but to unify Python 2 unicode and Python 3 str, at static type checking time.

At runtime, if you want this, you almost certainly want the actual str or unicode constructor, so you'd use something like six.text_type.

answered Jul 17 '18 at 03:41

abarnert

354,177
51
601
671