Questions tagged [hadoop]

Hadoop is an open-source solution for providing a distributed/replicated file system, a produciton grade map-reduce system, and has a series of complementary additions like Hive, Pig, and HBase to get more out of a Hadoop-powered cluster.

Hadoop is an Apache foundation sponsored project, with commercial support provided by multiple vendors, including Cloudera, Hortonworks, and MapR. Apache has a more complete set of commercial solutions documented.

Available complementary additions to Hadoop include:

  • Hadoop distributed filesystem ( standard )
  • The map-reduce architecture ( standard )
  • Hive, which provides a SQL like interface to the M/R arch
  • Hbase, a distributed key-value service

Recommended reference sources:

261 questions
1
vote
0 answers

Hadoop logging on Mac going to console.. any setting to have it go to Hadoop's log files instead?

When I run a command for Hadoop on my Mac, Hadoop sends the logs to the command line. I could capture this on my own with > mylogfile.log, but I'm hoping there is simply a setting/option in Hadoop or Apache-logging that changes how this…
Dolan Antenucci
  • 329
  • 1
  • 4
  • 16
1
vote
1 answer

Running hadoop jobs on Cloudera 3 as regular user?

Looking at Cloudera's installation instructions, I don't see any mention of how to run jobs as regular users. When I try to run a sample job, this is what I get: hadoop jar /usr/lib/hadoop/hadoop-*-examples.jar pi 2 100000 Number of Maps =…
Dolan Antenucci
  • 329
  • 1
  • 4
  • 16
1
vote
1 answer

Are the Hadoop {start|stop}-all.sh scripts typically used w/ cloudera 3 clusters, and if so, how?

I've got a cluster I'm setting up Cloudera 3 on, and it is unclear on whether I should be using these start/stop scripts like I used to with the standard Apache Hadoop setup (where I had a specific user account that ran all the Hadoop stuff). With…
Dolan Antenucci
  • 329
  • 1
  • 4
  • 16
1
vote
0 answers

why i am not able to install cdh3

https://ccp.cloudera.com/display/CDHDOC/CDH3+Installation#CDH3Installation-InstallingCDH3onUbuntuSystems I have ubuntu 8.04 ,i wget the squeeze and tried to dpkg it gives me this error . ksn@ksn-test:~$ sudo dpkg -i…
Rahul Mehta
  • 999
  • 3
  • 11
  • 13
1
vote
0 answers

Remote connection to Hadoop namenode issue

I am required to hack a single node hadoop "cluster" (cloudera psuedo-distributed) to be able to access it remotely. I have successfully installed hadoop and I have updated the localhost identifiers in the configs to the IP address of the machine. …
Rob Parker
  • 133
  • 1
  • 1
  • 7
1
vote
1 answer

Hive metadata permission issue

We are getting this error on Hive, while creating a DB / table hive> CREATE TABLE pokes (foo INT, bar STRING); FAILED: Error in metadata: javax.jdo.JDOFatalDataStoreException: Cannot get a connection, pool error Could…
Chandramohan
  • 11
  • 1
  • 2
1
vote
0 answers

Hadoop machines going down - logs to look for?

I have a hadoop cluster with ~7 machines, and some of the machines were keep going down. Sometimes, the hadoop datanode / jobtracker processes only dies (the machine is still running), and other times, the entire machine goes down. I haven't really…
Jeeyoung Kim
  • 229
  • 2
  • 8
1
vote
1 answer

problem while running apache mahout quickstartryin

I was trying to run the mahout clustering example from the quickstart at: https://cwiki.apache.org/MAHOUT/clustering-of-synthetic-control-data.html While running any of the clustering implementations as specified here, I get the following…
Krishnamurthy
1
vote
2 answers

Does hadoop take care of different node HD size alone?

I have a single node (pseudo-distributed config) and I'm considering adding a 2nd slave node. Does it matter if the slave has less HD capacity ? Will the rebalance take of that for itself. I'm not an HADOOP expert by far.
millebii
  • 161
  • 8
1
vote
3 answers

Hadoop Rolling Small files

I am running Hadoop on a project and need a suggestion. Generally by default Hadoop has a "block size" of around 64mb.. There is also a suggestion to not use many/small files.. I am currently having very very very small files being put into HDFS due…
Arenstar
  • 3,602
  • 2
  • 25
  • 34
1
vote
1 answer

Hadoop installation on multiple instances of Ubuntu 10.04.1 running on VMware Workstation

I want to learn about Hadoop and run some hands on distributed computing by doing some programming. I have a PC with Windows 7 Professional installed. On the same PC, I also have a Ubuntu 10.04.1 installed on VMware Workstation 7. I want to know if…
learner
  • 163
  • 2
  • 9
1
vote
4 answers

Disks for hadoop, what do you recommend?

what is you recommendation about disks for Hadoop? Do you recommend using SAS, or just attaching disk over SATA? Or maybe something else? What are pros and cons of every option? (Decision about disk size has been made, and there will be about 5-6…
wlk
  • 1,713
  • 3
  • 14
  • 19
1
vote
2 answers

Even data distribution on hadoop/hive

I am trying a small hadoop setup (for experimentation) with just 2 machines. I am loading about 13GB of data, a table of around 39 million rows, with a replication factor of 1 using Hive. My problem is hadoop always stores all this data on a single…
Shweta Agrawal
1
vote
0 answers

HDP cluster + journal nodes get out of Sync

we have HDP cluster version 2.6.5 when we look on name-node logs we can see the following warning 2023-02-20 15:56:37,731 INFO namenode.FileJournalManager (FileJournalManager.java:finalizeLogSegment(142)) - Finalizing edits file…
King David
  • 549
  • 6
  • 20
1
vote
0 answers

YARN + how to debug wget

we are testing with wget VIA port 8088 the connection from ResourceManager02 to ResourceManager01 both Resource Managers are part of YARN service , and each resource manager service installed on RHEL 7.9 version as the following wget…
King David
  • 549
  • 6
  • 20