40

I would like to write a custom list class in Python (let's call it MyCollection) where I can eventually call:

for x in myCollectionInstance:
    #do something here

How would I go about doing that? Is there some class I have to extend, or are there any functions I must override in order to do so?

K Mehta
  • 10,323
  • 4
  • 46
  • 76
  • Could you clarify better your requirements? If you subclass any iterable class (list, dict, etc...) it should work without problems. But maybe I am missing something? – mac Jul 03 '11 at 00:33
  • @mac: If I subclassed an iterable class, I'd also want a way to be able to access the underlying list object so that I can provide additional functions that act on it. I don't want a key-value pair (dict), so something that emulates an indexed collection (list) would suffice. – K Mehta Jul 03 '11 at 00:46

5 Answers5

42

Your can subclass list if your collection basically behaves like a list:

class MyCollection(list):
    def __init__(self, *args, **kwargs):
        super(MyCollection, self).__init__(args[0])

However, if your main wish is that your collection supports the iterator protocol, you just have to provide an __iter__ method:

class MyCollection(object):
    def __init__(self):
        self._data = [4, 8, 15, 16, 23, 42]

    def __iter__(self):
        for elem in self._data:
            yield elem

This allows you to iterate over any instance of MyCollection.

K Mehta
  • 10,323
  • 4
  • 46
  • 76
jena
  • 8,096
  • 1
  • 24
  • 23
22

I like to subclass MutableSequence, as recommended by Alex Martelli.

If you want a fairly comprehensive implementation of a MutableSequence() list, you can take a look at the CPython collections.UserList() source.

I added notes about using an acl... if you want to restrict the list to only holding certain kinds of objects, you can use an acl method to optionally ensure that you're only maintaining certain object types in the MutableSequence() subclass.

from collections.abc import MutableSequence

class MyList(MutableSequence):
    """
    An extensive user-defined wrapper around list objects.

    Inspiration:
        https://github.com/python/cpython/blob/208a7e957b812ad3b3733791845447677a704f3e/Lib/collections/__init__.py#L1174https://github.com/python/cpython/blob/208a7e957b812ad3b3733791845447677a704f3e/Lib/collections/__init__.py#L1174
    """

    def __init__(self, initlist=None):
        self.data = []
        if initlist is not None:
            if isinstance(initlist, list):
                self.data[:] = initlist

            elif isinstance(initlist, MyList):
                self.data[:] = initlist.data[:]

            else:
                self.data = list(initlist)

    def __repr__(self):
        return """<{} data: {}>""".format(self.__class__.__name__, repr(self.data))

    def __lt__(self, other):
        return self.data < self.__cast(other)

    def __le__(self, other):
        return self.data <= self.__cast(other)

    def __eq__(self, other):
        return self.data == self.__cast(other)

    def __gt__(self, other):
        return self.data > self.__cast(other)

    def __ge__(self, other):
        return self.data >= self.__cast(other)

    def __cast(self, other):
        return other.data if isinstance(other, MyList) else other

    def __contains__(self, value):
        return value in self.data

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        if isinstance(i, slice):
            return self.__class__(self.data[idx])
        else:
            return self.data[idx]

    def __setitem__(self, idx, value):
        # optional: self._acl_check(val)
        self.data[idx] = value

    def __delitem__(self, idx):
        del self.data[idx]

    def __add__(self, other):
        if isinstance(other, MyList):
            return self.__class__(self.data + other.data)

        elif isinstance(other, type(self.data)):
            return self.__class__(self.data + other)

        return self.__class__(self.data + list(other))

    def __radd__(self, other):
        if isinstance(other, MyList):
            return self.__class__(other.data + self.data)

        elif isinstance(other, type(self.data)):
            return self.__class__(other + self.data)

        return self.__class__(list(other) + self.data)

    def __iadd__(self, other):
        if isinstance(other, MyList):
            self.data += other.data

        elif isinstance(other, type(self.data)):
            self.data += other

        else:
            self.data += list(other)

        return self

    def __mul__(self, nn):
        return self.__class__(self.data * nn)

    __rmul__ = __mul__

    def __imul__(self, nn):
        self.data *= nn
        return self

    def __copy__(self):
        inst = self.__class__.__new__(self.__class__)
        inst.__dict__.update(self.__dict__)

        # Create a copy and avoid triggering descriptors
        inst.__dict__["data"] = self.__dict__["data"][:]

        return inst

    def append(self, value):
        self.data.append(value)

    def insert(self, idx, value):
        self.data.insert(idx, value)

    def pop(self, idx=-1):
        return self.data.pop(idx)

    def remove(self, value):
        self.data.remove(value)

    def clear(self):
        self.data.clear()

    def copy(self):
        return self.__class__(self)

    def count(self, value):
        return self.data.count(value)

    def index(self, idx, *args):
        return self.data.index(idx, *args)

    def reverse(self):
        self.data.reverse()

    def sort(self, /, *args, **kwds):
        self.data.sort(*args, **kwds)

    def extend(self, other):
        if isinstance(other, MyList):
            self.data.extend(other.data)

        else:
            self.data.extend(other)

if __name__=='__main__':
    foo = MyList([1,2,3,4,5])
    foo.append(6)
    print(foo)  # <MyList [1, 2, 3, 4, 5, 6]>

    for idx, ii in enumerate(foo):
        print("MyList[%s] = %s" % (idx, ii))
Mike Pennington
  • 41,899
  • 19
  • 136
  • 174
  • 9
    `if not (data is None):` that's not how you write your python code, until a Grand Jedi Master you are. `if data is not None` -- this looks good. – byashimov Feb 05 '16 at 13:56
  • 2
    True, in addition, I guess only `if data:` should work fine. – 20-roso Sep 17 '19 at 18:44
7

In Python 3 we have beautiful collections.UserList([list]):

Class that simulates a list. The instance’s contents are kept in a regular list, which is accessible via the data attribute of UserList instances. The instance’s contents are initially set to a copy of list, defaulting to the empty list []. list can be any iterable, for example a real Python list or a UserList object.

In addition to supporting the methods and operations of mutable sequences, UserList instances provide the following attribute: data A real list object used to store the contents of the UserList class.

https://docs.python.org/3/library/collections.html#userlist-objects

ramusus
  • 7,789
  • 5
  • 38
  • 45
4

You could extend the list class:

class MyList(list):

    def __init__(self, *args):
        super(MyList, self).__init__(args[0])
        # Do something with the other args (and potentially kwars)

Example usage:

a = MyList((1,2,3), 35, 22)
print(a)
for x in a:
    print(x)

Expected output:

[1, 2, 3]
1
2
3
mac
  • 42,153
  • 26
  • 121
  • 131
3

Implementing a list from scratch requires you to implement the full container protocol:

__len__()

__iter__()    __reversed__()

_getitem__()  __contains__()
__setitem__() __delitem__()

__eq__()      __ne__()       __gt__()
__lt__()      __ge__()       __le__()

__add__()     __radd__()     __iadd__()
__mul__()     __rmul__()     __imul__()

__str__()     __repr__()     __hash__

But the crux of the list is its read-only protocol, as captured by collections.abc.Sequence's 3 methods:

  • __len__()
  • __getitem__()
  • __iter__()

To see that in action, here it is a lazy read-only list backed by a range instance (super handy because it knows how to do slicing gymnastics), where any materialized values are stored in a cache (e.g. a dictionary):

import copy
from collections.abc import Sequence
from typing import Dict, Union

class LazyListView(Sequence):
    def __init__(self, length):
        self._range = range(length)
        self._cache: Dict[int, Value] = {}

    def __len__(self) -> int:
        return len(self._range)

    def __getitem__(self, ix: Union[int, slice]) -> Value:
        length = len(self)

        if isinstance(ix, slice):
            clone = copy.copy(self)
            clone._range = self._range[slice(*ix.indices(length))]  # slicing
            return clone
        else:
            if ix < 0:
                ix += len(self)  # negative indices count from the end
            if not (0 <= ix < length):
                raise IndexError(f"list index {ix} out of range [0, {length})")
            if ix not in self._cache:
                ...  # update cache
            return self._cache[ix]

    def __iter__(self) -> dict:
        for i, _row_ix in enumerate(self._range):
            yield self[i]

Although the above class is still missing the write-protocol and all the rest methods like __eq__(), __add__(), it is already quite functional.

>>> alist = LazyListView(12)
>>> type(alist[3:])
LazyListView

A nice thing is that slices retain the class, so they refrain breaking laziness and materialize elements (e.g. by coding an appropriate repr() method).

Yet the class still fails miserably in simple tests:

>>> alist == alist[:]
False

You have to implement __eq__() to fix this, and use facilities like functools.total_ordering() to implement __gt__() etc:

from functools import total_ordering

@total_ordering
class LaxyListView
    def __eq__(self, other):
        if self is other:
            return True
        if len(self) != len(other):
            return False
        return all(a == b for a, b in zip(self, other)

    def __lt__(self, other):
        if self is other:
            return 0
        res = all(self < other for a, b in zip(self, other)
        if res:
            return len(self) < len(other)

But that is indeed considerable effort.

NOTICE: if you try to bypass the effort and inherit list (instead of Sequence), more modifications are needed because, e.g. copy.copy() would now try to copy also the underlying list and end up calling __iter__(), destroying laziness; furthermore, __add__() method fills-in internally list, breaking adding of slices.

ankostis
  • 8,579
  • 3
  • 47
  • 61