Questions tagged [hadoop]

Hadoop is an open-source solution for providing a distributed/replicated file system, a produciton grade map-reduce system, and has a series of complementary additions like Hive, Pig, and HBase to get more out of a Hadoop-powered cluster.

Hadoop is an Apache foundation sponsored project, with commercial support provided by multiple vendors, including Cloudera, Hortonworks, and MapR. Apache has a more complete set of commercial solutions documented.

Available complementary additions to Hadoop include:

  • Hadoop distributed filesystem ( standard )
  • The map-reduce architecture ( standard )
  • Hive, which provides a SQL like interface to the M/R arch
  • Hbase, a distributed key-value service

Recommended reference sources:

261 questions
-2
votes
2 answers

How to specify hard disk for hadoop cluster?

I installed Hadoop on one Azure VM, and it works fine by using its OS disk. However, I attached one hard disk to my VM, and I want to know how to configure Hadoop to only use this new hard disk as its default storage disk. Can anyone tell me how to…
billcyz
  • 1,720
  • 3
  • 13
  • 16
-2
votes
1 answer

What tool/application (free or not) should I use to generate and write lots (TBs) of data, fast

I am testing a storage appliance and need to write TeraBytes of data to it. Using fio or dd takes days to write that much amount of data. Are there any (free or not-free) tools/applications, that I could use for this purpose. The tool should…
-2
votes
4 answers

why http://server:60010 web page for the running HBase Master

I installed by : wget http://archive.cloudera.com/cdh/3/hbase-0.90.3-cdh3u1.tar.gz This is my hbase_site.xml hbase.master localhost:60000 The host and port that the HBase master runs…
Rahul Mehta
  • 999
  • 3
  • 11
  • 13
-2
votes
2 answers

Hadoop on Virtual Machines

We would like to migrate from MySQL to Hadoop for scalability. Will hadoop clusters running on virtual machines improve our website's performance? What are the advantages and disadvantages of implementing hadoop cluster on virtual machines(ESXi).
Jonar
  • 171
  • 4
  • 13
-2
votes
1 answer

How does placing data in various racks help to exploit the fact that intra-rack aggregated bandwidth>=inter-rack bandwidth?

GFS research paper snapshot it says that(my interpretation after reading research paper and its reviews) "inter rack bandwidth is lower than aggregated intra rack bandwidth(not sure what it means by aggregated, it doesn't make much sense of kind of…
-6
votes
1 answer

What cause this error?

I am running two docker containers one is for hadoop basic services and other is for flume. Services are running successfully. I linked two containers env variables are automatically set by docker successfully. 1.2.3.4 7ab4ffb30dc0 ff00::0…
Gibbs
  • 137
  • 1
  • 7
1 2 3
17
18