I'm working on a python project that requires me to compile certain attributes of some objects into a dataset. The code I'm currently using is something like the following:
class VectorBuilder(object):
SIZE = 5
def __init__(self, player, frame_data):
self.player = player
self.fd = frame_data
def build(self):
self._vector = []
self._add(self.player)
self._add(self.fd.getSomeData())
self._add(self.fd.getSomeOtherData())
char = self.fd.getCharacter()
self._add(char.getCharacterData())
self._add(char.getMoreCharacterData())
assert len(self._vector) == self.SIZE
return self._vector
def _add(self, element):
self._vector.append(element)
However, this code is slightly unclean because adding/removing attributes to/from the dataset also requires correctly adjusting the SIZE
variable. The reason I even have the SIZE
variable is that the size of the dataset needs to be known at runtime before the dataset itself is created.
I've thought of instead keeping a list of all the functions used to construct the dataset as strings (as in attributes = ['getPlayer', 'fd.getSomeData', ...]
) and then defining the build
function as something like:
def build(self):
self._vector = []
for att in attributes:
self._vector.append(getattr(self, att)())
return self._vector
This would let me access the size as simply len(attributes)
and I only ever need to edit attributes
, but I don't know how to make this approach work with the chained function calls, such as self.fd.getCharacter().getCharacterData()
.
Is there a cleaner way to accomplish what I'm trying to do?
EDIT:
Some additional information and clarification is necessary.
- I was using
__
due to some bad advice I read online (essentially saying I should use_
for module-private members and__
for class-private members). I've edited them to_
attributes now. - The getters are a part of the framework I'm using.
- The vector is stored as a private class member so I don't have to pass it around the construction methods, which are in actuality more numerous than the simple
_add
, doing some other stuff like normalisation andbool->int
conversion on the elements before adding them to the vector. SIZE
as it currently stands, is a true constant. It is only ever given a value in the first line ofVectorBuilder
and never changed at runtime. I realise that I did not clarify this properly in the main post, but new attributes never get added at runtime. The adjustment I was talking about would take place at programming time. For example, if I wanted to add a new attribute, I would need to add it in thebuild
function, e.g.:self._add(self.fd.getCharacter().getAction().getActionData().getSpeed())
, as well as change the
SIZE
definition toSIZE = 6
.- The attributes are compiled into what is currently a simple python list (but will probably be replaced with a numpy array), then passed into a neural network as an input vector. However, the neural network itself needs to be built first, and this happens before any data is made available (i.e. before any input vectors are created). In order to be built successfully, the neural network needs to know the size of the input vectors it will be receiving, though. This is why
SIZE
is necessary and also the reason for theassert
statement - to ascertain that the vectors I'm passing to the network are in fact the size I claimed I would be passing to it.
I'm aware the code is unpythonic, that is why I'm here - the code works, it's just ugly.