0

I have an object that represents a User and a variable measured 20 times. The object will be something like this:

class User:
    user_id: str 
    measures: List[float] #this is a list(or an array) of size 20

Given that I have many users that I need to represent I would like to use __slots__ to store the variable (so I can save space). Although I don't know if it's possible to implement this directly will save memory, because it probably will store the memory for the pointer to the list, but not the list floats. The following code runs, but not sure how is memory-wise compared to the latter:

class User:
    __slots__ =['user_id', 'measures'] # this implementation runs, but no idea if its using slots "properly"

    user_id: str 
    measures: List[float] 

    def __init__(self, user_id:str, measures:List[float]):
         #...

or maybe the only alternative is to declare the 20 variables independently? (this is very cumbersome but I know it will work)

class User:
    __slots__ =['user_id', 'm1', 'm2', ...] #very cumbersome 

    user_id: str 
    m1:float 
    m2:float 
    ...

    def __init__(self, user_id:str, measures:List[float]):
         #...

or maybe I should use another class that contains the measures.

halfer
  • 19,824
  • 17
  • 99
  • 186
Pablo
  • 3,135
  • 4
  • 27
  • 43
  • 2
    "*will this implementation work ?*" - have you already tried it? what complexities have you encountered when testing that? – RomanPerekhrest Feb 20 '23 at 12:52
  • it is what I have implemented, it runs, but I wondering if a non-fixed list stored in __slots__ might not save the memory efficiently as if we use the latter (saving each value separately). Let me edit that in the question! – Pablo Feb 20 '23 at 12:56
  • 3
    If you operate on fixed length sequence with static values (not intended to be changed) use `tuple` – RomanPerekhrest Feb 20 '23 at 12:59
  • using `measures:Tuple[(float,)*20]` should work ?, and yes is a fixed length array (that's what gave me the feeling that there was a memory efficiently way of implementing it) – Pablo Feb 20 '23 at 13:20
  • `tuple[(float,)*20]` is not a legal type hint, at least as far as `mypy` is concerned. (It does not expand the expression `(float,)*20` to produce `tuple[float,float,...,float]`.) – chepner Feb 21 '23 at 15:05

1 Answers1

1

If memory is a concern, you should first think of keeping floats in an array (either numpy or plain Python array), as floats either "stand alone" or as elements in a list will be a full Python object with tens of bytes .

The simpler way to do it is to simply have the array to be created along with each instance of your object, and then you can use plain indexing on it:

from array import array

class User:
    __slots__ =['user_id', 'measures'] 
    user_id: str 
    measures: array

    def __init__(self, user_id:str, measures:List[float]):
         #...
         self.measures = array("d")
         self.measures.fromlist(measures)

Python array objects have no support in typing, so the best approach is simply forget about type-annotating it - unless you are willing to spend several hours fighting uphill for an optional annotation until you get it passing (but not necessarily right).

But most important in this case: when you want to use or set values in your measures list, Python will take the 8 bytes in which your value are stored (and you can even store 4 bytes 32bit fp values, just use "f" for the array data type), and "unbox" it into a Python float, ready to be used: user.measures[0] * 3

If you want to constrain the size of the measures list to less, or at most 20, you can do it with a plain Python if statement inside __init__:


    def __init__(self, user_id:str, measures:List[float]):
         #...
         if len(measures) != 20:
              raise ValueError(...)
         ...

moreover, and maybe more interesting for you - let's say you want to be able to get to each measure by as a dotted attribute with a hardcoded name, and still store each measure as a 4byte only value - you can write custom code in the __getattr__ and __setattr__ methods, and some metadata allowing the mapping. But I digress - feel free to comment along if you need such a feature.

jsbueno
  • 99,910
  • 10
  • 151
  • 209
  • thanks a lot @jsbueno ! that's was exactly what I was looking for!, and yes my concern was memory allocation optimization. – Pablo Feb 24 '23 at 00:00