Questions tagged [hadoop]

Hadoop is an open-source solution for providing a distributed/replicated file system, a produciton grade map-reduce system, and has a series of complementary additions like Hive, Pig, and HBase to get more out of a Hadoop-powered cluster.

Hadoop is an Apache foundation sponsored project, with commercial support provided by multiple vendors, including Cloudera, Hortonworks, and MapR. Apache has a more complete set of commercial solutions documented.

Available complementary additions to Hadoop include:

Hadoop distributed filesystem ( standard )
The map-reduce architecture ( standard )
Hive, which provides a SQL like interface to the M/R arch
Hbase, a distributed key-value service

Recommended reference sources:

Hive Language Reference

261 questions

vote

0 answers

Hadoop logging on Mac going to console.. any setting to have it go to Hadoop's log files instead?

When I run a command for Hadoop on my Mac, Hadoop sends the logs to the command line. I could capture this on my own with > mylogfile.log, but I'm hoping there is simply a setting/option in Hadoop or Apache-logging that changes how this…

logging console hadoop

asked Sep 28 '11 at 14:00

Dolan Antenucci

vote

1 answer

Running hadoop jobs on Cloudera 3 as regular user?

Looking at Cloudera's installation instructions, I don't see any mention of how to run jobs as regular users. When I try to run a sample job, this is what I get: hadoop jar /usr/lib/hadoop/hadoop-*-examples.jar pi 2 100000 Number of Maps =…

installation hadoop

asked Sep 07 '11 at 00:07

Dolan Antenucci

vote

1 answer

Are the Hadoop {start|stop}-all.sh scripts typically used w/ cloudera 3 clusters, and if so, how?

I've got a cluster I'm setting up Cloudera 3 on, and it is unclear on whether I should be using these start/stop scripts like I used to with the standard Apache Hadoop setup (where I had a specific user account that ran all the Hadoop stuff). With…

installation hadoop

asked Sep 06 '11 at 02:53

Dolan Antenucci

vote

0 answers

why i am not able to install cdh3

https://ccp.cloudera.com/display/CDHDOC/CDH3+Installation#CDH3Installation-InstallingCDH3onUbuntuSystems I have ubuntu 8.04 ,i wget the squeeze and tried to dpkg it gives me this error . ksn@ksn-test:~$ sudo dpkg -i…

ubuntu hadoop

asked Aug 31 '11 at 07:59

Rahul Mehta

vote

0 answers

Remote connection to Hadoop namenode issue

I am required to hack a single node hadoop "cluster" (cloudera psuedo-distributed) to be able to access it remotely. I have successfully installed hadoop and I have updated the localhost identifiers in the configs to the IP address of the machine. …

ssh hadoop

asked Aug 18 '11 at 15:26

Rob Parker

vote

1 answer

Hive metadata permission issue

We are getting this error on Hive, while creating a DB / table hive> CREATE TABLE pokes (foo INT, bar STRING); FAILED: Error in metadata: javax.jdo.JDOFatalDataStoreException: Cannot get a connection, pool error Could…

hadoop

asked Aug 11 '11 at 12:45

Chandramohan

vote

0 answers

Hadoop machines going down - logs to look for?

I have a hadoop cluster with ~7 machines, and some of the machines were keep going down. Sometimes, the hadoop datanode / jobtracker processes only dies (the machine is still running), and other times, the entire machine goes down. I haven't really…

linux hadoop

asked Jul 13 '11 at 22:51

Jeeyoung Kim

vote

1 answer

problem while running apache mahout quickstartryin

I was trying to run the mahout clustering example from the quickstart at: https://cwiki.apache.org/MAHOUT/clustering-of-synthetic-control-data.html While running any of the clustering implementations as specified here, I get the following…

hadoop cluster apache-2.2

asked Jun 19 '11 at 20:23

Krishnamurthy

vote

2 answers

Does hadoop take care of different node HD size alone?

I have a single node (pseudo-distributed config) and I'm considering adding a 2nd slave node. Does it matter if the slave has less HD capacity ? Will the rebalance take of that for itself. I'm not an HADOOP expert by far.

hadoop cluster

asked May 06 '11 at 22:00

millebii

vote

3 answers

Hadoop Rolling Small files

I am running Hadoop on a project and need a suggestion. Generally by default Hadoop has a "block size" of around 64mb.. There is also a suggestion to not use many/small files.. I am currently having very very very small files being put into HDFS due…

linux hadoop apache-2.2 mapreduce

asked Nov 16 '10 at 03:03

Arenstar

3,602
2
25
34

vote

1 answer

Hadoop installation on multiple instances of Ubuntu 10.04.1 running on VMware Workstation

I want to learn about Hadoop and run some hands on distributed computing by doing some programming. I have a PC with Windows 7 Professional installed. On the same PC, I also have a Ubuntu 10.04.1 installed on VMware Workstation 7. I want to know if…

ubuntu-10.04 vmware-workstation hadoop

asked Aug 24 '10 at 17:10

learner

vote

4 answers

Disks for hadoop, what do you recommend?

what is you recommendation about disks for Hadoop? Do you recommend using SAS, or just attaching disk over SATA? Or maybe something else? What are pros and cons of every option? (Decision about disk size has been made, and there will be about 5-6…

hard-drive hardware hadoop

asked Jul 26 '10 at 05:56

wlk

1,713
3
14
19

vote

2 answers

Even data distribution on hadoop/hive

I am trying a small hadoop setup (for experimentation) with just 2 machines. I am loading about 13GB of data, a table of around 39 million rows, with a replication factor of 1 using Hive. My problem is hadoop always stores all this data on a single…

hadoop

asked Jul 06 '10 at 11:41

Shweta Agrawal

vote

0 answers

HDP cluster + journal nodes get out of Sync

we have HDP cluster version 2.6.5 when we look on name-node logs we can see the following warning 2023-02-20 15:56:37,731 INFO namenode.FileJournalManager (FileJournalManager.java:finalizeLogSegment(142)) - Finalizing edits file…

linux hadoop hdfs big-data

asked Feb 23 '23 at 12:00

King David

vote

0 answers

YARN + how to debug wget

we are testing with wget VIA port 8088 the connection from ResourceManager02 to ResourceManager01 both Resource Managers are part of YARN service , and each resource manager service installed on RHEL 7.9 version as the following wget…

linux networking http wget hadoop

asked Feb 22 '23 at 17:24

King David

Prev 1 2 3

…

17 18 Next