I am trying to learn the Torch library for machine learning.
I know that the focus of Torch is neural networks, but just for the sake of it I was trying to run kmeans on it. If nothing, Torch implements fast contiguous storage which should be analogous to numpy arrays, and the Torch cheatsheet cites the unsup library for unsupervised learning, so why not?
I already have a benchmark that I use for K-means implementations. Even though all the implementations there are intentionally using an unoptimized algorithm (the README explains why), LuaJIT is able to cluster 100000 points in 611ms. An optimized (or shall I say, not intentionally slowed down) implementation in Nim (not on the repository) runs in 68 ms,so I was expecting something in-between.
Unfortunately, things are much worse, so I suspect I am doing something awfully wrong. What I have written is
require 'io'
cjson = require 'cjson'
require 'torch'
require 'unsup'
content = io.open("points.json"):read("*a")
data = cjson.decode(content)
points = torch.Tensor(data)
timer = torch.Timer()
centroids, counts = unsup.kmeans(points, 10, 15)
print(string.format('Time required: %f s', timer:time().real))
and the running time is around 6 seconds!
Can anyone check if I have done something wrong in using Torch/unsup?
If anyone wants to try it, the file points.json
is in the above repository