0

Still somewhat perplexed by python and it's magic functional programming, so I tend to find myself writing code that is more towards the Java paradigm of programming as opposed to Idiomatic Python.

My question is somewhat related to: How do I make a custom class a collection in Python

The only difference is I have nested objects (using composition). The VirtualPage object is comprised of a list of PhysicalPage objects. I have a function which can take a list of PhyscialPage objects and coalesce all of the details into a single named tuple I call PageBoundary. Essentially it's a serialization function which can spit out a tuple comprised of an integer range which represents the physical page and the line number in the page. From this I can easily sort and order VirtualPages among one another (that's the idea at least):

PageBoundary = collections.namedtuple('PageBoundary', 'begin end')

I also have a function which can take a PageBoundary namedtuple and de-serialize or expand the tuple into a list of PhysicalPages. It's preferable that these two data storage classes not change as it will break any downstream code.

Here is a snippet of my custom python2.7 class. It is composed of lot things one is list which contains a the object PhysicalPage:

class VirtualPage(object):
    def __init__(self, _physical_pages=list()):
        self.physcial_pages = _physcial_pages


class PhysicalPage(object):
    # class variables: number of digits each attribute gets
    _PAGE_PAD, _LINE_PAD = 10, 12 

    def __init__(self, _page_num=-1):
        self.page_num = _page_num
        self.begin_line_num = -1
        self.end_line_num = -1

    def get_cannonical_begin(self):
        return int(''.join([str(self.page_num).zfill(PhysicalPage._PAGE_PAD),
                    str(tmp_line_num).zfill(PhysicalPage._LINE_PAD) ]))

    def get_cannonical_end(self):
        pass # see get_cannonical_begin() implementation

    def get_canonical_page_boundaries(self):
        return PageBoundary(self.get_canonical_begin(), self.get_canonical_end())

I would like to leverage some templated collection (from the python collections module) to easily sort and compare as list or set of VirtualPage classes. Also would like some advice on the layout of my data storage classes: VirtualPage and PhysicalPage.

Given either a sequence of VirtualPages or as in the example below:

vp_1 = VirtualPage(list_of_physical_pages)
vp_1_copy = VirtualPage(list_of_physical_pages)
vp_2 = VirtualPage(list_of_other_physical_pages)

I want to easily answer questions like this:

>>> vp_2 in vp_1 
False
>>> vp_2 < vp_1
True
>>> vp_1 == vp_1_copy
True

Right off the bat it seems obvious that the VirtualPage class needs to call get_cannonical_page_boundaries or even implement the function itself. At a minimum it should loop over it's PhysicalPage list to implement the required functions (lt() and eq()) so I can compare b/w VirtualPages.

1.) Currently I'm struggling with implementing some of the comparison functions. One big obstacle is how to compare a tuple? Do I create my own lt() function by creating a custom class which extends some type of collection:

import collections as col
import functools

@total_ordering
class AbstractVirtualPageContainer(col.MutableSet):

    def __lt__(self, other):
        '''What type would other be?
        Make comparison by first normalizing to a comparable type: PageBoundary
        '''
        pass

2.) Should the comparison function implementation exist in the VirtualPage class instead?

I was leaning towards some type of Set data structure as the properties of the data I'm modeling has the concept of uniqueness: i.e. physical page values cannot overlap and to some extend act as a linked list. Also would setter or getter functions, implemented via @ decorator functions be of any use here?

Community
  • 1
  • 1
Dave
  • 13
  • 1
  • 4
  • What does it mean to say `vp_2 in vp_1`? That they have at least one physical page in common, or all pages in vp_2 are also in vp_1? Similarly for `<`, `==`, etc. Are these operations defined in terms of physical pages or PageBoundaries? I think `get_cannonical_begin()` can be simplified to `return self.page_num*10**PhysicalPage._LINE_PAD + tmp_line_num` – RootTwo Mar 19 '16 at 07:13
  • Sorry keep pressing enter hehe Say I start with an initial set of PhysicalPages in a volume: **pp_init_set**. My goal is to break **pp_init_set** into distinct **VirtualPage** objects by defining their page+line breaks in the form of **PhysicalPages**. When I'm done I should have set/list of unique **VirtualPage** objects. When I sum this set up I should get the original **pp_init_set**. This implies there are neither missing pages (holes) nor overlapping pages. Also this identity holds: VirtualPage[i].physical_pages[-1].end_line_num == VirtualPage[i+ 1].physical_pages[0].begin_line_num – Dave Mar 19 '16 at 07:45
  • Maybe vp_2 in vp_1 isn't that useful. I really would like to define enough of the function in https://docs.python.org/2/library/collections.html?highlight=collections#collections-abstract-base-classes to have my own MutableSet of **VirtualPages** which I can then, at a high level, do vp_2 - vp_1 to get the difference b/w two touching **VirtualPages**. In this case I would expect to implicity get a new **VirtualPage** where the two rules I stated above are still valid. **PageBoundaries** is just a simple translation function to represent a list of **PhysicalPages**. Thanks for the response btw! – Dave Mar 19 '16 at 07:58

1 Answers1

0

I think you want something like the code below. Not tested; certainly not tested for your application or with your data, YMMV, etc.

from collections import namedtuple

# PageBoundary is a subclass of named tuple with special relational
# operators. __le__ and __ge__ are left undefined because they don't
# make sense for this class.
class PageBoundary(namedtuple('PageBoundary', 'begin end')):
    # to prevent making an instance dict (See namedtuple docs)
    __slots__ = ()

    def __lt__(self, other):
        return self.end < other.begin

    def __eq__(self, other):
        # you can put in an assertion if you are concerned the
        # method might be called with the wrong type object
        assert isinstance(other, PageBoundary), "Wrong type for other"

        return self.begin == other.begin and self.end == other.end

    def __ne__(self, other):
        return not self == other

    def __gt__(self, other):
        return other < self


class PhysicalPage(object):
    # class variables: number of digits each attribute gets
    _PAGE_PAD, _LINE_PAD = 10, 12 

    def __init__(self, page_num):
        self.page_num = page_num

        # single leading underscore is 'private' by convention
        # not enforced by the language
        self._begin = self.page_num * 10**PhysicalPage._LINE_PAD + tmp_line_num
        #self._end = ...however you calculate this...                    ^ not defined yet

        self.begin_line_num = -1
        self.end_line_num = -1

    # this serves the purpose of a `getter`, but looks just like
    # a normal class member access. used like x = page.begin  
    @property
    def begin(self):
        return self._begin

    @property
    def end(self):
        return self._end

    def __lt__(self, other):
        assert(isinstance(other, PhysicalPage))
        return self._end < other._begin

    def __eq__(self, other):
        assert(isinstance(other, PhysicalPage))
        return self._begin, self._end == other._begin, other._end

    def __ne__(self, other):
        return not self == other

    def __gt__(self, other):
        return other < self


class VirtualPage(object):
    def __init__(self, physical_pages=None):
        self.physcial_pages = sorted(physcial_pages) if physical_pages else []

    def __lt__(self, other):
        if self.physical_pages and other.physical_pages:
            return self.physical_pages[-1].end < other.physical_pages[0].begin

        else:
            raise ValueError

    def __eq__(self, other):
        if self.physical_pages and other.physical_pages:
            return self.physical_pages == other.physical_pages

        else:
            raise ValueError

    def __gt__(self, other):
        return other < self

And a few observations:

Although there is no such thing as "private" members in Python classes, it is a convention to begin a variable name with a single underscore, _, to indicate it is not part of the public interface of the class / module/ etc. So, naming method parameters of public methods with an '_', doesn't seem correct, e.g., def __init__(self, _page_num=-1).

Python generally doesn't use setters / getters; just use the attributes directly. If attribute values need to be calculated, or other some other processing is needed use the @property decorator (as shown for PhysicalPage.begin() above).

It's generally not a good idea to initialize a default function argument with a mutable object. def __init__(self, physical_pages=list()) does not initialize physical_pages with a new empty list each time; rather, it uses the same list every time. If the list is modified, at the next function call physical_pages will be initialized with the modified list. See VirtualPages initializer for an alternative.

RootTwo
  • 4,288
  • 1
  • 11
  • 15
  • Thanks! I can definitely use a lot of these ideas. And also thanks for the default argument and mutable objects. Sounds like a really good way to shoot yourself in the foot and not know it. Appreciate your time! – Dave Mar 21 '16 at 02:11