I tried to search an answer to my question, it probably exists but I couldn't find any, probably because I don't know how to name it.
So let's get to the point.
I am working on a statistical model. My dataset contains information about 45 public transport stations and static information about them (so 45 rows and around 1000 feature columns) and regularly-spaced temporal measurements (so 2 000 000 rows but only 10 columns). So the "real" amount of information in itself is small enough (around 500 Mb) to be processed easily.
The problem is that most statistical modules on python require simple 2d arrays, like numpy's arrays. So I have to combine my data, for all the 2 000 000 rows of measure I have to attach the 1000+ columns of features related to the station in which the measure took place so I end up with a 17 Gb database... but most of it is redundant and I feel it is a waste of resource.
I have an idea but I have absolutely no idea of how to do it : ultimately an array is just a function that for a given i and j returns a value, so is it possible for me to "emulate", or come up with a fake array, pseudo array, array interface, that can be accepted like a numpy array by the modules? I don't see why I couldn't do it, because ultimately an array IS a function (i:j) -> a[i][j]. My access could be a little slower than the memory access of a classic array but this is my problem, and it still will be fast. I just don't know how to do it...
Thank you in advance and tell me if I need to bring more information or change the way I ask my question!
EDIT :
Ok maybe I can clarify my problem a little bit :
My data can be presented in a relational database or an object database fashion. It is pretty small (500 Mb) in that format, but when I merge my tables to make it possible for scikit-learn to process it, it becomes way too big (17 Gb +) and it seems a waste of memory! I could use techniques to handle big data, that may be a solution, but isn't it possible to prevent doing so? Can I use my data directly in scikit_learn without having to merge explicitly the tables? Can I make an implicit merging that emulates the datastructure without taking additional memory?