my real data has some 10000+ items. I have a complicated numpy record array of a format roughly like:
a = (((1., 2., 3.), 4., 'metadata1'),
((1., 3., 5.), 5., 'metadata1'),
((1., 2., 4.), 5., 'metadata2'),
((1., 2., 5.), 5., 'metadata2'),
((1., 3., 8.), 5., 'metadata3'))
My columns are defined by dtype = [('coords', '3f4'), ('values', 'f4'), ('meta', 'S10')]
. I get a list of all my possible meta values by doing set(a['meta']).
And I'd like to split it into smaller lists based on the 'meta' column. Ideally, I'd like results like:
a['metadata1'] == (((1., 2., 3.), 4.), ((1., 3., 5.), 5.))
a['metadata2'] == (((1., 2., 4.), 5.), ((1., 2., 5.), 5.))
a['metadata3'] == (((1., 3., 8.), 5.))
or
a[0] = (((1., 2., 3.), 4., 'metadata1'), ((1., 3., 5.), 5., 'metadata1'))
a[1] = (((1., 2., 4.), 5., 'metadata2'), ((1., 2., 5.), 5., 'metadata2'))
a[2] = (((1., 3., 8.), 5., 'metadata3'))
or any other conveniently split format.
Although, for a large dataset, the former is better on memory. Any ideas on how to do this split? I've seen some other questions here, but they are all testing for numerical values.