1

Say I have a list of coordinates (tuples with a constant length n) where n is determined at runtime. I would like to essentially build an n-dimensional histogram but where the bins aren't just counts but rather, each contains all the coordinate-tuples which fall into that bin.

Example of what I'd like:

Input:

list: [(-0.308, 0.414), (-0.058, -0.279), (0.860, 0.118), (-0.543, -0.093)]
bin_width: 1

Output:

[[[(-0.058, -0.279), (-0.543, -0.093)], [(-0.308, 0.414)]], [[], [(0.860, 0.118)]]]

Update: I have a solution now (see my answer below). Though if you have a better idea, please share. In particular, it would be nice to convert this method over to generators instead of lists. - My example here is short but the way I intend to use it, my input list might be very large and I also only really need to use the output once.

zondo
  • 19,901
  • 8
  • 44
  • 83
kram1032
  • 237
  • 1
  • 14

1 Answers1

0

Hopefully I did this right.

Functions:

from math import *


def minmax(coordinate_list):                                        # returns a list of the minimum and maximum
    return map(lambda x: (min(x), max(x)), zip(*coordinate_list))   # occuring value of each coordinate of input lists


def find_range(min_max_list):                                           # for each dimension finds the necessary
    return map(lambda x, y: ceil(y) - floor(x), *zip(*min_max_list))    # range for the nested list


def find_bin_range(ranges, bin_width):     # turns the ranges in coordinate units into ones in terms of bin widths
    return [max(r * bin_width, 1) for r in ranges]


def build_bins(bin_ranges):     # given a list of ranges, recursively builds a nested list structure to be filled --
    if not bin_ranges:          # the histogram bins
        return []
    return [build_bins(bin_ranges[1:]) for _ in range(ceil(bin_ranges[0]))]


def access_bin(coordinates, key, bins, bin_width, min_max_list):    # recursively accesses each bin
    if not key:                                                     # and fills it with coordinate
        bins.append(coordinates)
    else:
        minimum, _ = min_max_list[0]
        i = int((key[0] - floor(minimum)) * bin_width)
        return access_bin(coordinates, key[1:], bins[i], bin_width, min_max_list[1:])


def fill_bins(coordinate_list, bins, bin_width, min_max_list):    # fills each bin with appropriate coordinates
    for coordinates in coordinate_list:
        access_bin(coordinates, coordinates, bins, bin_width, min_max_list)
    return bins


def coordinate_list_to_bins(coordinate_list, bin_width):    # the complete procedure
    min_max_list = list(minmax(coordinate_list))
    ranges = find_range(min_max_list)
    bin_ranges = find_bin_range(ranges)
    bins = build_bins(bin_ranges)
    return fill_bins(coordinate_list, bins, bin_width, min_max_list)

Usage:

import random


coordinate_list = [(random.uniform(-1, 1), random.uniform(-.5, .5)) for _ in range(4)]
bin_width = 1
print(coordinate_list)
print(coordinate_list_to_bins(coordinate_list, bin_width))

Output:

[(0.197, 0.278), (0.333, -0.030), (0.363, -0.298), (0.553, -0.286)]
[[[(0.333, -0.030), (0.363, -0.298), (0.553, -0.286)], [(0.197, 0.278)]]]
kram1032
  • 237
  • 1
  • 14
  • 1
    As you are talking about possibly large lists, and have a look at numpy and it's multidimensional histogram support: http://docs.scipy.org/doc/numpy-1.10.1/reference/generated/numpy.histogramdd.html#numpy.histogramdd – Markus May 15 '16 at 20:05
  • Doesn't that just tell me how many points land in each bin? That's not quite what I want here. I'm not quite asking for a histogram. Just something that's conceptually close to one. - The bins should contain the points, not the number of points. – kram1032 May 15 '16 at 20:25
  • Sorry, you're right. When I read 'n-dimensional histogram' I was immediately thinking numpy, without closely reading the rest. :-/ – Markus May 15 '16 at 20:34
  • There is digitize in numpy, which could help you here, but it only works on one dimensional bins. Perhaps http://stackoverflow.com/questions/24643229/extending-numpy-digitize-to-multi-dimensional-data can help you. – Markus May 15 '16 at 20:46
  • Hmm, I'll have to play around with numpy and/or pandas to get the hang of those. However, from what I can tell, all digitize does is figuring out which particular bin a value would have to land in. It could replace this line in my code: `i = int((key[0] - floor(minimum)) * bin_width)` but otherwise I'm not sure that it would be particularly useful here? I have no experience with them though so I might be missing something important – kram1032 May 15 '16 at 21:27