What data structure can I used for N-dimensional data with category names instead of integer indices?

Question

After iterating through multiple nested loops to train multiple DL models, I want to poke around in the results to compare how different changes to hyperparameters, different datasets, and different model architectures performed. The data feels like a 3D array, but I want to preserve the data categories in lieu of row, column, depth indices. Currently, my approach is to use nested dictionaries. Ultimately, I would like to be able to access it similarly to a 3D array (slicing elements from any given dimension or dimensions in no particular order using colon operators). Because every category exists in every other category, I have to assume there is a more efficient way of dealing with this data than iterating through nested dicts, but that is all I have managed to come up with. What data structure (if any) would do this better?

(I am currently dealing with this in 3D, but I imagine a solution exists in N-dimensional space, so I'd be most interested in that general solution.)

If it helps, here are more specifics from my use-case: The data is from 3 different training hyperparameter sets (eg. X, Y, Z), 6 different datasets (eg. A-F), and 6 different model architectures (eg. 1-6). In gathering this data, I stored it all in dicts as follows: RESULTS = {<training style>: {<dataset_name>: {<model_arch>: ...}}}. To access any individual result is quite easy RESULTS['Y']['A']['3']. For that purpose, the nested dicts work fine. If I want to access all of the results for a specific style and dataset, again, that is easy: RESULTS['X']['B']. Where I feel like they are falling flat is when I want to access all of the results for a different dimension, such as model architecture. For example, if I want to see how model 1 performed across each training style and dataset, I have to iterate through those upper-level dicts to get model 1 data from each occurrence. Sure, I could have created the dicts with model as the top-most layer, and that would solve this problem, but then what if I want to get all of the results for 'A'.

Please provide enough code so others can better understand or reproduce the problem. — Community, Jul 18 '23 at 17:54

score 0 · Answer 1 · answered Jul 18 '23 at 17:52

I don't think there is any special data structure for your use case. If your data is numeric, you can use the transpose of arrays to get your preferred dimension for your use case. While doing that you can keep your base (main) array intact by performing the transpose operations on a deep copy of the base array. You may look at these APIs of numpy which may be helpful for your use cases. Here is the documentation for numpy transpose function. This is an example of transposing a 3d matrix.

On the other hand, if your data is categorical then you can do some mapping from categorical to numeric and then use the approach mentioned above.

Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). — Community, Jul 21 '23 at 20:49

What data structure can I used for N-dimensional data with category names instead of integer indices?

1 Answers1