I am trying to come up with a very fast algorithm for calculating this very interesting statistic that takes full advantage of the capabilities of a powerful GPU. Ideally I will do this in Matlab using Jacket, but other ideas in CUDA code or OpenCL would also be much appreciated. Basically, I want to crowd-source lots of clever ideas that I can try to put together, and then I will attempt to open-source the result so others can use it.
Despite the power of this dependency coefficient (it is capable of detecting even "one-to-many" dependency relationships), there is almost nothing about it online, except for two sources: SAS statistical software, and the excellent R package Hmisc by Frank Harrell. You can read a description of the algorithm here:
And here is Harrell's code in Fortran (which is surprisingly easy to follow if you understand the calculation already):
http://hmisc.sourcearchive.com/documentation/3.7-0/hoeffd_8f-source.html
(also, page 128 of the pdf documentation for Hmisc has some more details.)
This is a very computationally demanding algorithm-- if you wanted to apply it to a data set that is thousands of rows and a few thousand columns, even with the fast Fortran implementation, you would be waiting many days for the result-- even on a new machine. My hope is that using an Nvidia GTX 580 level card, or better yet, a Tesla, would bring that down to a couple hours. In which case, the combination would be an analytical force to be reckoned with, whether identifying genes or finding correlations in experimental data.
Anyway, I look forward to peoples' responses and hope we can make a fast, GPU based algorithm for Hoeffding's D a reality.
Thanks in advance for any ideas-- and please don't hesitate to give partial or half-baked ideas!
Update: So, Jascha has generously provided a working implementation of Hoeffding's D in Matlab, which fulfilled one of my objectives. The other was to radically enhance the speed by using a GPU, preferably using Jacket. Does anyone see a path or strategy to do this in a clever way on the GPU?