5

I am trying to use Mahout in an application running on Windows. I want to build clusters from a lucene index using k-means.

As soon as I have to create sequence files (creating vectors from a lucene index), I get a Hadoop-Exception, since Hadoop makes command line calls to programs unknown in a Windows environment (e.g. chmod). Running in Cygwin is not an option, since I want to be able to run the App from eclipse.

So my question is

  • is there a way to avoid having to create sequence files to retrieve my vectors from a lucene index?
  • or is there a way to create sequence files in a Windows environment?
  • Sean Owen
    • 66,182
    • 23
    • 141
    • 173
    user249210
    • 51
    • 1
    • 2

    3 Answers3

    4

    The only way you can run Hadoop on a Windows environment is to install Cygwin. For more info, see this blog post:

    http://hayesdavis.net/2008/06/14/running-hadoop-on-windows/

    Cygwin will provide all the command-line utilities (like chmod) that Hadoop relies on. You can still run your Hadoop jobs from within Eclipse if you want.

    bajafresh4life
    • 12,491
    • 5
    • 37
    • 46
    • Seconded, this is more a question about Hadoop, and no you can't run Hadoop on Windows. – Sean Owen May 02 '10 at 06:41
    • HDInsight is a Hadoop implementation for Windows Azure. If you want to use it on your local machine and not in the cloud, try using the HDInsight emulator, which you can install with Web Platform Installer. – user888734 Feb 17 '14 at 13:28
    • You may want to update your answer, since it's now possible to use Hadop with Windows (https://wiki.apache.org/hadoop/Hadoop2OnWindows). I'd gladly answer, but I'm still looking for a way to use Mahout :) – merours Jul 31 '14 at 14:21
    1

    Do you know the SequenceFile API? Have a look here: http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/io/SequenceFile.html You can try to write/read the data by yourself.

    I think you can run Mahout from eclipse in Windowns in stand-alone mode. But you will appear several short comings and barriers. You should try how far you come.

    In my opinion you shouldn't insist on running mahout from eclipse. ;-)

    Peter Wippermann
    • 4,125
    • 5
    • 35
    • 48
    0

    You can use a virtual machine to run you Hadoop environment. As for me, the best solution is using http://hortonworks.com/ project. Everything works pretty.

    Alexander Davliatov
    • 723
    • 2
    • 8
    • 13