11

I have a custom Python class which essentially encapsulate a list of some kind of object, and I'm wondering how I should implement its __repr__ function. I'm tempted to go with the following:

class MyCollection:
   def __init__(self, objects = []):
      self._objects = []
      self._objects.extend(objects)

   def __repr__(self):
      return f"MyCollection({self._objects})"

This has the advantage of producing a valid Python output which fully describes the class instance. However, in my real-wold case, the object list can be rather large and each object may have a large repr by itself (they are arrays themselves).

What are the best practices in such situations? Accept that the repr might often be a very long string? Are there potential issues related to this (debugger UI, etc.)? Should I implement some kind of shortening scheme using semicolon? If so, is there a good/standard way to achieve this? Or should I skip listing the collection's content altogether?

abey
  • 593
  • 10
  • 26
  • 2
    Have you ever seen a built-in container type that does any of the things you're contemplating changing your container to do? – user2357112 Jul 13 '20 at 21:29
  • @user2357112supportsMonica Well `list` does seem to aggressively print everything. I haven't tested other built-in containers, but numpy's `ndarray` do shorten its repr (I know, not standard lib, but still...) – abey Jul 13 '20 at 21:36
  • The point of `repr` isn't to make something that looks nice: it's to aid in debugging. Saving the shortening algorithm for `__str__` or a custom method. – chepner Jul 13 '20 at 21:37

1 Answers1

30

The official documentation outlines this as how you should handle __repr__:

Called by the repr() built-in function to compute the “official” string representation of an object. If at all possible, this should look like a valid Python expression that could be used to recreate an object with the same value (given an appropriate environment). If this is not possible, a string of the form <...some useful description...> should be returned. The return value must be a string object. If a class defines __repr__() but not __str__(), then __repr__() is also used when an “informal” string representation of instances of that class is required.

This is typically used for debugging, so it is important that the representation is information-rich and unambiguous.

Python 3 __repr__ Docs

Lists, strings, sets, tuples and dictionaries all print out the entirety of their collection in their __repr__ method.

Your current code looks to perfectly follow the example of what the documentation suggests. Though I would suggest changing your __init__ method so it looks more like this:

class MyCollection:
   def __init__(self, objects=None):
       if objects is None:
           objects = []
      self._objects = objects

   def __repr__(self):
      return f"MyCollection({self._objects})"

You generally want to avoid using mutable objects as default arguments. Technically because of the way your method is implemented using extend (which makes a copy of the list), it will still work perfectly fine, but Python's documentation still suggests you avoid this.

It is good programming practice to not use mutable objects as default values. Instead, use None as the default value and inside the function, check if the parameter is None and create a new list/dictionary/whatever if it is.

https://docs.python.org/3/faq/programming.html#why-are-default-values-shared-between-objects

If you're interested in how another library handles it differently, the repr for Numpy arrays only shows the first three items and the last three items when the array length is greater than 1,000. It also formats the items so they all use the same amount of space (In the example below, 1000 takes up four spaces so 0 has to be padded with three more spaces to match).

>>> repr(np.array([i for i in range(1001)]))
'array([   0,    1,    2, ...,  998,  999, 1000])'

To mimic this numpy array style you could implement a __repr__ method like this in your class:

class MyCollection:
   def __init__(self, objects=None):
      if objects is None:
          objects = []
      self._objects = objects

   def __repr__(self):
       # If length is less than 1,000 return the full list.
      if len(self._objects) < 1000:
          return f"MyCollection({self._objects})"
      else:
          # Get the first and last three items
          items_to_display = self._objects[:3] + self._objects[-3:]
          # Find the which item has the longest repr
          max_length_repr = max(items_to_display, key=lambda x: len(repr(x)))
          # Get the length of the item with the longest repr
          padding = len(repr(max_length_repr))
          # Create a list of the reprs of each item and apply the padding
          values = [repr(item).rjust(padding) for item in items_to_display]
          # Insert the '...' inbetween the 3rd and 4th item
          values.insert(3, '...')
          # Convert the list to a string joined by commas
          array_as_string = ', '.join(values)
          return f"MyCollection([{array_as_string}])"

>>> repr(MyCollection([1,2,3,4]))
'MyCollection([1, 2, 3, 4])'

>>> repr(MyCollection([i for i in range(1001)]))
'MyCollection([   0,    1,    2, ...,  998,  999, 1000])'
          
Andrew Vasylchuk
  • 4,671
  • 2
  • 12
  • 30
Nala Nkadi
  • 409
  • 5
  • 5